HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Mingxuan Du Benfeng Xu Chiwei Zhu Xiaorui Wang Zhendong Mao

DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Abstract

Deep Research Agents are a prominent category of LLM-based agents. Byautonomously orchestrating multistep web exploration, targeted retrieval, andhigher-order synthesis, they transform vast amounts of online information intoanalyst-grade, citation-rich reports--compressing hours of manual desk researchinto minutes. However, a comprehensive benchmark for systematically evaluatingthe capabilities of these agents remains absent. To bridge this gap, we presentDeepResearch Bench, a benchmark consisting of 100 PhD-level research tasks,each meticulously crafted by domain experts across 22 distinct fields.Evaluating DRAs is inherently complex and labor-intensive. We therefore proposetwo novel methodologies that achieve strong alignment with human judgment. Thefirst is a reference-based method with adaptive criteria to assess the qualityof generated research reports. The other framework is introduced to evaluateDRA's information retrieval and collection capabilities by assessing itseffective citation count and overall citation accuracy. We have open-sourcedDeepResearch Bench and key components of these frameworks athttps://github.com/Ayanami0730/deep_research_bench to accelerate thedevelopment of practical LLM-based agents.

Code Repositories

ayanami0730/deep_research_bench
Official
Mentioned in GitHub

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents | Papers | HyperAI