a month ago

Advancing Speech Understanding in Speech-Aware Language Models with GRPO

Avishai Elmakies Hagai Aronowitz Nimrod Shabtay Eli Schwartz Ron Hoory Avihu Dekel

Abstract

In this paper, we introduce a Group Relative Policy Optimization (GRPO)-basedmethod for training Speech-Aware Large Language Models (SALLMs) on open-formatspeech understanding tasks, such as Spoken Question Answering and AutomaticSpeech Translation. SALLMs have proven highly effective for speechunderstanding tasks. GRPO has recently gained traction for its efficiency intraining LLMs, and prior work has explored its application to SALLMs, primarilyin multiple-choice tasks. Building on this, we focus on open-format tasks thatbetter reflect the generative abilities of the models. Our approach leveragesGRPO with BLEU as the reward signal to optimize SALLMs, and we demonstrateempirically that it surpasses standard SFT across several key metrics. Finally,we explore the potential of incorporating off-policy samples within GRPO forthese tasks, highlighting avenues for further improvement and further research.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Advancing Speech Understanding in Speech-Aware Language Models with GRPO

Avishai Elmakies Hagai Aronowitz Nimrod Shabtay Eli Schwartz Ron Hoory Avihu Dekel

Abstract

Build AI with AI

Hyper Newsletters