HyperAIHyperAI

Command Palette

Search for a command to run...

LawInstruct: The First large-scale Dataset of Legal Instructions

Date

a year ago

Size

9.84 GB

Organization

Stanford University

Paper URL

arxiv.org

LawInstruct is the first large-scale instruction dataset for the legal field. The dataset was jointly created by Stanford University, Johns Hopkins University and other institutions and will be released in April 2024. LawInstruct was created to fill the gaps in existing legal task datasets and accelerate the development of models in the legal field.

  1. Dataset characteristics:
    • Coverage: LawInstruct covers 17 jurisdictions and 24 languages, ensuring broad applicability and diversity of the dataset.
    • Scale and diversity: Contains 12 million training examples, covering a variety of legal tasks such as question answering, entailment, summarization, and information extraction.
  2. Dataset structure:
    • Each example is presented in a customized instruction format, ensuring data consistency and operability.
    • It integrates 58 high-quality annotated datasets from different legal tasks and professional fields.
  3. Technical Implementation:
    • We used MultiLegalPile, a 689GB multilingual legal corpus, to provide rich pre-training materials for the model.
  4. Performance Improvements:
    • By adjusting instructions on LawInstruct, the balanced accuracy of the Flan-T5 XL model on LegalBench is significantly improved, verifying the positive impact of the dataset on model performance.
  5. Research and Papers:
LawInstruct.torrent
Seeding 1Downloading 0Completed 163Total Downloads 386
  • LawInstruct/
    • README.md
      2.09 KB
    • README.txt
      4.18 KB
      • data/
        • lawinstruct.zip
          9.84 GB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LawInstruct: The First large-scale Dataset of Legal Instructions | Datasets | HyperAI