HyperAIHyperAI

Command Palette

Search for a command to run...

LooGLE Long Context Understanding Ability Benchmark Dataset

Date

a year ago

Size

80.66 MB

Organization

Paper URL

arxiv.org

Featured Image

This dataset is a benchmark dataset, LooGLE, proposed by the Beijing Institute of General Artificial Intelligence (GIAI) and the Peking University Institute of Artificial Intelligence team for testing and evaluating the long-context understanding capabilities of large language models (LLMs).

LooGLE evaluated the 9 most popular long text LLMs and found that these models performed poorly in multi-information retrieval, time reordering, computation, and comprehension reasoning in complex long-dependency tasks. Commercial models (Claude3-200k, GPT4-32k, GPT4-8k, GPT3.5-turbo-6k, LlamaIndex) had an average accuracy of only 40%, and open source models (ChatGLM2-6B, LongLLaMa-3B, RWKV-4-14B-pile, LLaMA-7B-32K) had an accuracy of only 10%.

The paper "LooGLE: Can Long-Context Language Models Understand Long Contexts?" has been accepted by ACL2024. The co-authors of the paper are Li Jiaqi and Wang Mengmeng from the Institute of Communication Research, and the corresponding authors are Zheng Zilong, a researcher at the Institute of Communication Research, and Zhang Muhan, an assistant professor at Peking University.

LooGLE addresses the shortcomings of previous datasets by providing ultra-long texts, using relatively recent documents, and carefully designed and annotated real long dependency tasks. The launch of the LooGLE benchmark dataset not only provides new tools for evaluating and improving long text LLMs, but also provides a new direction for the development of artificial intelligence language processing technology.

LooGLE.torrent
Seeding 1Downloading 0Completed 181Total Downloads 265
  • LooGLE/
    • README.md
      2.01 KB
    • README.txt
      4.02 KB
      • data/
        • LooGLE.zip
          80.66 MB

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LooGLE Long Context Understanding Ability Benchmark Dataset | Datasets | HyperAI