Date

2 years ago

Size

80.66 MB

Organization

Paper URL

arxiv.org

Tags

LLM

Natural Language Processing

Benchmarks

This dataset is a benchmark dataset, LooGLE, proposed by the Beijing Institute of General Artificial Intelligence (GIAI) and the Peking University Institute of Artificial Intelligence team for testing and evaluating the long-context understanding capabilities of large language models (LLMs). LooGLE evaluated the 9 most popular long text LLMs and found that these models performed poorly in multi-information retrieval, time reordering, computation, and comprehension reasoning in complex long-dependency tasks. Commercial models (Claude3-200k, GPT4-32k, GPT4-8k, GPT3.5-turbo-6k, LlamaIndex) had an average accuracy of only 40%, and open source models (ChatGLM2-6B, LongLLaMa-3B, RWKV-4-14B-pile, LLaMA-7B-32K) had an accuracy of only 10%. The research paper is titled "LooGLE: Can Long-Context Language Models Understand Long Contexts?The paper has been accepted by ACL2024. The co-first authors are Jiaqi Li and Mengmeng Wang from the Institute of Communications and Information Technology, and the corresponding authors are Zilong Zheng, a researcher at the Institute of Communications and Information Technology, and Muhan Zhang, an assistant professor at Peking University. LooGLE addresses the shortcomings of previous datasets by providing ultra-long texts, using relatively recent documents, and carefully designed and annotated real long dependency tasks. The launch of the LooGLE benchmark dataset not only provides new tools for evaluating and improving long text LLMs, but also provides a new direction for the development of artificial intelligence language processing technology.

LooGLE.torrent

Seeding 1Downloading 0Completed 223Total Downloads 316

LooGLE/
- README.md
  2.01 KB
- README.txt
  4.02 KB

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Use this Dataset

Discuss on Discord

Date

2 years ago

Size

80.66 MB

Organization

Paper URL

arxiv.org

Related Datasets

Groundsource Global Flood Events Dataset

2 months ago

CL-bench Context Learning Evaluation Benchmark Dataset

2 months ago

GroundingME Complex Scene Understanding Evaluation Dataset

4 months ago

LongBench-Pro Long Context Comprehensive Evaluation Dataset

4 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

LooGLE Long Context Understanding Ability Benchmark Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

LooGLE Long Context Understanding Ability Benchmark Dataset

Related Datasets

Groundsource Global Flood Events Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

LooGLE Long Context Understanding Ability Benchmark Dataset

Related Datasets

Groundsource Global Flood Events Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

Groundsource Global Flood Events Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

Groundsource Global Flood Events Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset