Command Palette
Search for a command to run...
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Abstract
Recent progress in vision and language foundation models has significantlyadvanced multimodal understanding, reasoning, and generation, inspiring a surgeof interest in extending such capabilities to embodied settings throughvision-language-action (VLA) models. Yet, most VLA models are still trainedwith supervised fine-tuning (SFT), which struggles to generalize underdistribution shifts due to error accumulation. Reinforcement learning (RL)offers a promising alternative by directly optimizing task performance throughinteraction, but existing attempts remain fragmented and lack a unifiedplatform for fair and systematic comparison across model architectures andalgorithmic designs. To address this gap, we introduce RLinf-VLA, a unified andefficient framework for scalable RL training of VLA models. The system adopts ahighly flexible resource allocation design that addresses the challenge ofintegrating rendering, training, and inference in RL+VLA training. Inparticular, for GPU-parallelized simulators, RLinf-VLA implements a novelhybrid fine-grained pipeline allocation mode, achieving a 1.61x-1.88x speedupin training. Through a unified interface, RLinf-VLA seamlessly supports diverseVLA architectures (e.g., OpenVLA, OpenVLA-OFT), multiple RL algorithms (e.g.,PPO, GRPO), and various simulators (e.g., ManiSkill, LIBERO). In simulation, aunified model achieves 98.11\% across 130 LIBERO tasks and 97.66\% across 25ManiSkill tasks. Beyond empirical performance, our study distills a set of bestpractices for applying RL to VLA training and sheds light on emerging patternsin this integration. Furthermore, we present preliminary deployment on areal-world Franka robot, where RL-trained policies exhibit strongergeneralization than those trained with SFT. We envision RLinf-VLA as afoundation to accelerate and standardize research on embodied intelligence.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.