Command Palette
Search for a command to run...
Advancing Speech Understanding in Speech-Aware Language Models with GRPO
Avishai Elmakies Hagai Aronowitz Nimrod Shabtay Eli Schwartz Ron Hoory Avihu Dekel

Abstract
In this paper, we introduce a Group Relative Policy Optimization (GRPO)-basedmethod for training Speech-Aware Large Language Models (SALLMs) on open-formatspeech understanding tasks, such as Spoken Question Answering and AutomaticSpeech Translation. SALLMs have proven highly effective for speechunderstanding tasks. GRPO has recently gained traction for its efficiency intraining LLMs, and prior work has explored its application to SALLMs, primarilyin multiple-choice tasks. Building on this, we focus on open-format tasks thatbetter reflect the generative abilities of the models. Our approach leveragesGRPO with BLEU as the reward signal to optimize SALLMs, and we demonstrateempirically that it surpasses standard SFT across several key metrics. Finally,we explore the potential of incorporating off-policy samples within GRPO forthese tasks, highlighting avenues for further improvement and further research.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.