HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

WebShaper: Agentically Data Synthesizing via Information-Seeking
  Formalization

Abstract

The advent of Large Language Model (LLM)-powered agents has revolutionizedartificial intelligence by enabling solutions to complex, open-ended tasksthrough web-based information-seeking (IS) capabilities. The scarcity ofhigh-quality training data has limited the development of IS agents. Existingapproaches typically adopt an information-driven paradigm that first collectsweb data and then generates questions based on the retrieval. However, this maylead to inconsistency between information structure and reasoning structure,question and answer. To mitigate, we propose a formalization-driven IS datasynthesis framework WebShaper to construct a dataset. WebShaper systematicallyformalizes IS tasks through set theory. Central to the formalization is theconcept of Knowledge Projections (KP), which enables precise control overreasoning structure by KP operation compositions. During synthesis, we begin bycreating seed tasks, then use a multi-step expansion process. At each step, anagentic Expander expands the current formal question more complex withretrieval and validation tools based on our formalization. We train our modelon the synthesized dataset. Experiment results demonstrate that WebShaperachieves state-of-the-art performance among open-sourced IS agents on GAIA andWebWalkerQA benchmarks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization | Papers | HyperAI