HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

Haoran Zhang Yafu Li Xuyang Hu Dongrui Liu Zhilin Wang Bo Li Yu Cheng

Reasoning over Boundaries: Enhancing Specification Alignment via
  Test-time Delibration

Abstract

Large language models (LLMs) are increasingly applied in diverse real-worldscenarios, each governed by bespoke behavioral and safety specifications (spec)custom-tailored by users or organizations. These spec, categorized intosafety-spec and behavioral-spec, vary across scenarios and evolve with changingpreferences and requirements. We formalize this challenge as specificationalignment, focusing on LLMs' ability to follow dynamic, scenario-specific specfrom both behavioral and safety perspectives. To address this challenge, wepropose Align3, a lightweight method that employs Test-Time Deliberation (TTD)with hierarchical reflection and revision to reason over the specificationboundaries. We further present SpecBench, a unified benchmark for measuringspecification alignment, covering 5 scenarios, 103 spec, and 1,500 prompts.Experiments on 15 reasoning and 18 instruct models with several TTD methods,including Self-Refine, TPO, and MoreThink, yield three key findings: (i)test-time deliberation enhances specification alignment; (ii) Align3 advancesthe safety-helpfulness trade-off frontier with minimal overhead; (iii)SpecBench effectively reveals alignment gaps. These results highlight thepotential of test-time deliberation as an effective strategy for reasoning overthe real-world specification boundaries.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration | Papers | HyperAI