HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Jiaru Zou Soumya Roy Vinay Kumar Verma Ziyi Wang David Wipf Pan Lu Sumit Negi James Zou Jingrui He

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular
  Reasoning

Abstract

Process Reward Models (PRMs) have recently emerged as a powerful frameworkfor enhancing the reasoning capabilities of large reasoning models (LRMs),particularly in the context of test-time scaling (TTS). However, theirpotential for supervising LRMs on tabular reasoning domains remainsunderexplored. Through detailed empirical analyses, we identify that existingPRMs, though widely adopted for supervising text-only reasoning steps, strugglewith table-specific operations such as sub-table retrieval and schemainteraction, leading to critical performance bottlenecks. To address thislimitation, we propose TaTToo, a novel table-grounded PRM framework that (i)reasons explicitly over tabular reasoning steps and (ii) integrates tool-basedverification to provide precise reward supervision. Concretely, we first designa scalable data curation pipeline that constructs over 60k high-qualitystep-level annotations by integrating table verification rationales withtool-based executions. Building on the collected data, we train TaTToo with adual-stage paradigm: cold-start supervised fine-tuning to capture tool-usereasoning patterns, followed by reinforcement learning with tool-groundedreward shaping to align our model with table-based verification. We provide acomprehensive evaluation of the policy improvement induced by our newlydesigned PRM. Across 5 challenging tabular reasoning benchmarks coveringnumerical reasoning, fact-checking, and data analysis, TaTToo improvesdownstream policy LRMs by 30.9% at inference, surpasses strong PRM baselinessuch as Qwen-2.5-Math-PRM-72B with only 8B parameters, and demonstrates stronggeneralizability across diverse TTS strategies.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning | Papers | HyperAI