Command Palette
Search for a command to run...
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
Ammar Khairi Daniel Dsouza Ye Shen Julia Kreutzer Sara Hooker

Abstract
Recent advancements in large language models (LLMs) have shifted focus towardscaling inference-time compute, improving performance without retraining themodel. A common approach is to sample multiple outputs in parallel, and selectone of these as the final output. However, work to date has focused on Englishand a handful of domains such as math and code. In contrast, we are mostinterested in techniques that generalize across open-ended tasks, formallyverifiable tasks, and across languages. In this work, we study how to robustlyscale inference-time compute for open-ended generative tasks in a multilingual,multi-task setting. Our findings show that both sampling strategy based on temperature variationand selection strategy must be adapted to account for diverse domains andvaried language settings. We evaluate existing selection methods, revealingthat strategies effective in English often fail to generalize across languages.We propose novel sampling and selection strategies specifically adapted formultilingual and multi-task inference scenarios, and show they yield notablegains across languages and tasks. In particular, our combined sampling andselection methods lead to an average +6.8 jump in win-rates for our 8B modelson m-ArenaHard-v2.0 prompts, against proprietary models such as Gemini. Atlarger scale, Command-A (111B model) equipped with our methods, shows +9.0improvement in win-rates on the same benchmark with just five samples againstsingle-sample decoding, a substantial increase at minimal cost. Our resultsunderscore the need for language- and task-aware approaches to inference-timecompute, aiming to democratize performance improvements in underrepresentedlanguages.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.