Command Palette
Search for a command to run...
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Abstract
We propose QeRL, a Quantization-enhanced Reinforcement Learning framework forlarge language models (LLMs). While RL is essential for LLMs' reasoningcapabilities, it is resource-intensive, requiring substantial GPU memory andlong rollout durations. QeRL addresses these issues by combining NVFP4quantization with Low-Rank Adaptation (LoRA), accelerating rollout phase of RLwhile reducing memory overhead. Beyond efficiency, our findings show thatquantization noise increases policy entropy, enhancing exploration, andenabling the discovery of better strategies during RL. To further optimizeexploration, QeRL introduces an Adaptive Quantization Noise (AQN) mechanism,which dynamically adjusts noise during training. Experiments demonstrate thatQeRL delivers over 1.5 times speedup in the rollout phase. Moreover, this isthe first framework to enable RL training of a 32B LLM on a single H100 80GBGPU, while delivering overall speedups for RL training. It also achieves fasterreward growth and higher final accuracy than 16-bit LoRA and QLoRA, whilematching the performance of full-parameter fine-tuning on mathematicalbenchmarks such as GSM8K (90.8%) and MATH 500 (77.4%) in the 7B model. Theseresults establish QeRL as an efficient and effective framework for RL trainingin LLMs.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.