Command Palette
Search for a command to run...
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Abstract
Vision-Language-Action (VLA) models have recently emerged as a powerfulparadigm for robotic manipulation. Despite substantial progress enabled bylarge-scale pretraining and supervised fine-tuning (SFT), these models face twofundamental challenges: (i) the scarcity and high cost of large-scalehuman-operated robotic trajectories required for SFT scaling, and (ii) limitedgeneralization to tasks involving distribution shift. Recent breakthroughs inLarge Reasoning Models (LRMs) demonstrate that reinforcement learning (RL) candramatically enhance step-by-step reasoning capabilities, raising a naturalquestion: Can RL similarly improve the long-horizon step-by-step actionplanning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RLframework tailored for VLA models. Building upon veRL, we introduceVLA-specific trajectory sampling, scalable parallelization, multi-environmentrendering, and optimized loss computation. When applied to OpenVLA-OFT,SimpleVLA-RL achieves SoTA performance on LIBERO and even outperforms pi_0on RoboTwin 1.0\&2.0 with the exploration-enhancing strategies we introduce.SimpleVLA-RL not only reduces dependence on large-scale data and enables robustgeneralization, but also remarkably surpasses SFT in real-world tasks.Moreover, we identify a novel phenomenon ``pushcut'' during RL training,wherein the policy discovers previously unseen patterns beyond those seen inthe previous training process. Github: https://github.com/PRIME-RL/SimpleVLA-RL
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.