HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and
  Methodology

Abstract

Models like OpenAI-o3 pioneer visual grounded reasoning by dynamicallyreferencing visual regions, just like human "thinking with images". However, nobenchmark exists to evaluate these capabilities holistically. To bridge thisgap, we propose TreeBench (Traceable Evidence Evaluation Benchmark), adiagnostic benchmark built on three principles: (1) focused visual perceptionof subtle targets in complex scenes, (2) traceable evidence via bounding boxevaluation, and (3) second-order reasoning to test object interactions andspatial hierarchies beyond simple object localization. Prioritizing images withdense objects, we initially sample 1K high-quality images from SA-1B, andincorporate eight LMM experts to manually annotate questions, candidateoptions, and answers for each image. After three stages of quality control,TreeBench consists of 405 challenging visual question-answering pairs, even themost advanced models struggle with this benchmark, where none of them reach 60%accuracy, e.g., OpenAI-o3 scores only 54.87. Furthermore, we introduce TreeVGR(Traceable Evidence Enhanced Visual Grounded Reasoning), a training paradigm tosupervise localization and reasoning jointly with reinforcement learning,enabling accurate localizations and explainable reasoning pathways. Initializedfrom Qwen2.5-VL-7B, it improves V* Bench (+16.8), MME-RealWorld (+12.6), andTreeBench (+13.4), proving traceability is key to advancing vision-groundedreasoning. The code is available at https://github.com/Haochen-Wang409/TreeVGR.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology | Papers | HyperAI