Date

8 months ago

Organization

Paper URL

2509.02473

License

CC BY 4.0

Dataset structure

The dataset contains three task types:

Single-choice questions: There are 579 carefully designed questions, each with only one correct answer. They are mainly used to test the model's understanding of database concepts and SQL queries.
Multiple-choice questions (Multiple): A total of 760 complex questions with multiple possible correct answers. They include precise numerical calculation results and conclusions based on reasoning, and are used to evaluate the model's comprehensive performance in data analysis and reasoning capabilities.
Report Generation (report): A total of 668 questions require the generation of detailed analysis reports, testing the data agent's ability to conduct comprehensive analysis in a multi-data source environment, and providing a standard report as a comparative evaluation benchmark.

This dataset is contributed by community users and is intended for educational and informational purposes only. If any content involves copyright infringement, please contact us at support@hyper.ai for prompt review and removal.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

Date

8 months ago

Organization

Paper URL

2509.02473

License

CC BY 4.0

Dataset structure

The dataset contains three task types:

Single-choice questions: There are 579 carefully designed questions, each with only one correct answer. They are mainly used to test the model's understanding of database concepts and SQL queries.
Multiple-choice questions (Multiple): A total of 760 complex questions with multiple possible correct answers. They include precise numerical calculation results and conclusions based on reasoning, and are used to evaluate the model's comprehensive performance in data analysis and reasoning capabilities.
Report Generation (report): A total of 668 questions require the generation of detailed analysis reports, testing the data agent's ability to conduct comprehensive analysis in a multi-data source environment, and providing a standard report as a comparative evaluation benchmark.

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

2 months ago

ToolACE Complex Tools Learning Dialogue Dataset

2 months ago

Sutra 10B Pretraining Teaching and Training Dataset

2 months ago

THINGS-EEG EEG Dataset

4 months ago

THINGS-MEG Magnetoencephalography Dataset

4 months ago

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

4 months ago

CL-bench Context Learning Evaluation Benchmark Dataset

4 months ago

RoVid-X Robot Video Generation Dataset

2 months ago

GroundingME Complex Scene Understanding Evaluation Dataset

5 months ago

MCIF Multimodal Cross-Language Instruction Following Dataset

5 months ago

TxT360-3efforts Multi-Task Inference Dataset

5 months ago

X-ray Contraband Detection Dataset

5 months ago

LongBench-Pro Long Context Comprehensive Evaluation Dataset

5 months ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

FDAbench-Full Heterogeneous Data Analysis Benchmark Dataset

Dataset structure

Build AI with AI

HyperAI Newsletters

Command Palette

FDAbench-Full Heterogeneous Data Analysis Benchmark Dataset

Dataset structure

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Sutra 10B Pretraining Teaching and Training Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

RoVid-X Robot Video Generation Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

FDAbench-Full Heterogeneous Data Analysis Benchmark Dataset

Dataset structure

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Sutra 10B Pretraining Teaching and Training Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

RoVid-X Robot Video Generation Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Sutra 10B Pretraining Teaching and Training Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

RoVid-X Robot Video Generation Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset

Related Datasets

DRACO Cross-Disciplinary Deep Research Benchmark Dataset

ToolACE Complex Tools Learning Dialogue Dataset

Sutra 10B Pretraining Teaching and Training Dataset

THINGS-EEG EEG Dataset

THINGS-MEG Magnetoencephalography Dataset

THINGS-fMRI Functional Magnetic Resonance Imaging Dataset

CL-bench Context Learning Evaluation Benchmark Dataset

RoVid-X Robot Video Generation Dataset

GroundingME Complex Scene Understanding Evaluation Dataset

MCIF Multimodal Cross-Language Instruction Following Dataset

TxT360-3efforts Multi-Task Inference Dataset

X-ray Contraband Detection Dataset

LongBench-Pro Long Context Comprehensive Evaluation Dataset