Date

2 years ago

Size

1.83 MB

Organization

Publish URL

github.com

Tags

LLM

Natural Language Processing

Multi-Task Learning

IEPile is a large-scale, high-quality bilingual (Chinese and English) information extraction (IE) instruction fine-tuning dataset developed by Zhejiang University, covering three core subtasks: named entity recognition (NER), relation extraction (RE), and event extraction (EE). The dataset contains about 2 million instruction samples, totaling about 320 million tokens, covering multiple fields such as general, medical, and financial. The research team carefully integrated 26 English and 7 Chinese IE datasets and adopted the proposed "schema-based polling instruction construction method", including the construction of a hard negative sample dictionary and polling instruction generation, to ensure the high quality of the dataset. The construction of IEPile significantly improved the performance of large models in information extraction tasks, especially zero-shot generalization capabilities, and provided valuable resources for information extraction research.

IEPile.torrent

Seeding 1Downloading 0Completed 345Total Downloads 750

IEPile/
- README.md
  1.47 KB
- README.txt
  2.94 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Download

Discuss on Discord

Date

2 years ago

Size

1.83 MB

Organization

Publish URL

github.com

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

2 months ago

RoVid-X Robot Video Generation Dataset

2 months ago

Nemotron-Math-v2 Mathematical Inference Dataset

5 months ago

GroundingME Complex Scene Understanding Evaluation Dataset

5 months ago

TxT360-3efforts Multi-Task Inference Dataset

5 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

IEPile Large-Scale Information Extraction Corpus

Build AI with AI

HyperAI Newsletters

Command Palette

IEPile Large-Scale Information Extraction Corpus

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

RoVid-X Robot Video Generation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

IEPile Large-Scale Information Extraction Corpus

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

RoVid-X Robot Video Generation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

TxT360-3efforts Multi-Task Inference Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

RoVid-X Robot Video Generation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

TxT360-3efforts Multi-Task Inference Dataset

Related Datasets

Sutra 10B Pretraining Teaching and Training Dataset

RoVid-X Robot Video Generation Dataset

Nemotron-Math-v2 Mathematical Inference Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

TxT360-3efforts Multi-Task Inference Dataset