HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Yifan Shen Yuanzhe Liu Jingyuan Zhu Xu Cao Xiaofeng Zhang Yixiao He Wenming Ye James Matthew Rehg Ismini Lourentzou

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Abstract

Current Vision-Language Models (VLMs) struggle with fine-grained spatialreasoning, particularly when multi-step logic and precise spatial alignment arerequired. In this work, we introduce SpatialReasoner-R1, a vision-languagereasoning model designed to address these limitations. To constructhigh-quality supervision for spatial reasoning, we design a Multi-Model MonteCarlo Tree Search (M3CTS) method that generates diverse, logically consistentLong Chain-of-Thought (LongCoT) reasoning trajectories. In addition, we proposefine-grained Direct Preference Optimization (fDPO), which introducessegment-specific preference granularity for descriptive grounding and logicalreasoning, guided by a spatial reward mechanism that evaluates candidateresponses based on visual consistency, spatial grounding, and logicalcoherence. Experimental results demonstrate that fDPO achieves an averageimprovement of 4.1% over standard DPO across spatial quality tasks, and a 9.0%gain in spatial quantity tasks. SpatialReasoner-R1, trained with fDPO, sets anew SoTA on SPATIALRGPT-Bench, outperforming the strongest baseline by 9.8% inaverage accuracy, while maintaining competitive performance on generalvision-language tasks.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs | Papers | HyperAI