2 months ago

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

Jun Zhan Mingyang Han Yuxuan Xie Chen Wang Dong Zhang Kexin Huang Haoxiang Shi DongXiao Wang Tengtao Song Qinyuan Cheng

Abstract

Spoken language models (SLMs) have emerged as a unified paradigm for speechunderstanding and generation, enabling natural human machine interaction.However, while most progress has focused on semantic accuracy and instructionfollowing, the ability of SLMs to adapt their speaking style based on spokeninstructions has received limited attention. We introduce Voice StyleAdaptation (VSA), a new task that examines whether SLMs can modify theirspeaking style, such as timbre, prosody, or persona following natural languagespoken commands. To study this task, we present VStyle, a bilingual (Chinese &English) benchmark covering four categories of speech generation: acousticattributes, natural language instruction, role play, and implicit empathy. Wealso introduce the Large Audio Language Model as a Judge (LALM as a Judge)framework, which progressively evaluates outputs along textual faithfulness,style adherence, and naturalness, ensuring reproducible and objectiveassessment. Experiments on commercial systems and open source SLMs demonstratethat current models face clear limitations in controllable style adaptation,highlighting both the novelty and challenge of this task. By releasing VStyleand its evaluation toolkit, we aim to provide the community with a foundationfor advancing human centered spoken interaction. The dataset and code arepublicly available athttps://junzhan2000.github.io/VStyle.github.io/{project's homepage}.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

Jun Zhan Mingyang Han Yuxuan Xie Chen Wang Dong Zhang Kexin Huang Haoxiang Shi DongXiao Wang Tengtao Song Qinyuan Cheng4 more

Abstract

Build AI with AI

Hyper Newsletters

Jun Zhan Mingyang Han Yuxuan Xie Chen Wang Dong Zhang Kexin Huang Haoxiang Shi DongXiao Wang Tengtao Song Qinyuan Cheng