6 months ago

Abstract

From professional research to everyday planning, many tasks are bottleneckedby wide-scale information seeking, which is more repetitive than cognitivelycomplex. With the rapid development of Large Language Models (LLMs), automatedsearch agents powered by LLMs offer a promising solution to liberate humansfrom this tedious work. However, the capability of these agents to perform such"wide-context" collection reliably and completely remains largely unevaluateddue to a lack of suitable benchmarks. To bridge this gap, we introduceWideSearch, a new benchmark engineered to evaluate agent reliability on theselarge-scale collection tasks. The benchmark features 200 manually curatedquestions (100 in English, 100 in Chinese) from over 15 diverse domains,grounded in real user queries. Each task requires agents to collect large-scaleatomic information, which could be verified one by one objectively, and arrangeit into a well-organized output. A rigorous five-stage quality control pipelineensures the difficulty, completeness, and verifiability of the dataset. Webenchmark over 10 state-of-the-art agentic search systems, includingsingle-agent, multi-agent frameworks, and end-to-end commercial systems. Mostsystems achieve overall success rates near 0%, with the best performerreaching just 5%. However, given sufficient time, cross-validation by multiplehuman testers can achieve a near 100% success rate. These results demonstratethat present search agents have critical deficiencies in large-scaleinformation seeking, underscoring urgent areas for future research anddevelopment in agentic search. Our dataset, evaluation pipeline, and benchmarkresults have been publicly released at https://widesearch-seed.github.io/

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

6 months ago

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

WideSearch: Benchmarking Agentic Broad Info-Seeking

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

WideSearch: Benchmarking Agentic Broad Info-Seeking

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

WideSearch: Benchmarking Agentic Broad Info-Seeking

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang

Ryan Wong Jiawei Wang Junjie Zhao Li Chen Yan Gao Long Zhang Xuan Zhou Zuo Wang Kai Xiang Ge Zhang