Command Palette
Search for a command to run...
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration
Haoran Zhang Yafu Li Xuyang Hu Dongrui Liu Zhilin Wang Bo Li Yu Cheng

Abstract
Large language models (LLMs) are increasingly applied in diverse real-worldscenarios, each governed by bespoke behavioral and safety specifications (spec)custom-tailored by users or organizations. These spec, categorized intosafety-spec and behavioral-spec, vary across scenarios and evolve with changingpreferences and requirements. We formalize this challenge as specificationalignment, focusing on LLMs' ability to follow dynamic, scenario-specific specfrom both behavioral and safety perspectives. To address this challenge, wepropose Align3, a lightweight method that employs Test-Time Deliberation (TTD)with hierarchical reflection and revision to reason over the specificationboundaries. We further present SpecBench, a unified benchmark for measuringspecification alignment, covering 5 scenarios, 103 spec, and 1,500 prompts.Experiments on 15 reasoning and 18 instruct models with several TTD methods,including Self-Refine, TPO, and MoreThink, yield three key findings: (i)test-time deliberation enhances specification alignment; (ii) Align3 advancesthe safety-helpfulness trade-off frontier with minimal overhead; (iii)SpecBench effectively reveals alignment gaps. These results highlight thepotential of test-time deliberation as an effective strategy for reasoning overthe real-world specification boundaries.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.