21 days ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Wei Huang Yi Ge Shuai Yang Yicheng Xiao Huizi Mao Yujun Lin Hanrong Ye Sifei Liu Ka Chun Cheung Hongxu Yin

Abstract

We propose QeRL, a Quantization-enhanced Reinforcement Learning framework forlarge language models (LLMs). While RL is essential for LLMs' reasoningcapabilities, it is resource-intensive, requiring substantial GPU memory andlong rollout durations. QeRL addresses these issues by combining NVFP4quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RLwhile reducing memory overhead. Beyond efficiency, our findings show thatquantization noise increases policy entropy, enhancing exploration, andenabling the discovery of better strategies during RL. To further optimizeexploration, QeRL introduces an Adaptive Quantization Noise (AQN) mechanism,which dynamically adjusts noise during training. Experiments demonstrate thatQeRL delivers over 1.5 times speedup in the rollout phase. Moreover, this isthe first framework to enable RL training of a 32B LLM on a single H100 80GBGPU, while delivering overall speedups for RL training. It also achieves fasterreward growth and higher final accuracy than 16-bit LoRA and QLoRA, whilematching the performance of full-parameter fine-tuning on mathematicalbenchmarks such as GSM8K (90.8%) and MATH 500 (77.4%) in the 7B model. Theseresults establish QeRL as an efficient and effective framework for RL trainingin LLMs.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Wei Huang Yi Ge Shuai Yang Yicheng Xiao Huizi Mao Yujun Lin Hanrong Ye Sifei Liu Ka Chun Cheung Hongxu Yin4 more

Abstract

Build AI with AI

Hyper Newsletters

Wei Huang Yi Ge Shuai Yang Yicheng Xiao Huizi Mao Yujun Lin Hanrong Ye Sifei Liu Ka Chun Cheung Hongxu Yin