2 months ago

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Haozhan Li Yuxin Zuo Jiale Yu Yuhao Zhang Zhaohui Yang Kaiyan Zhang Xuekai Zhu Yuchen Zhang Tianxing Chen Ganqu Cui

Abstract

Vision-Language-Action (VLA) models have recently emerged as a powerfulparadigm for robotic manipulation. Despite substantial progress enabled bylarge-scale pretraining and supervised fine-tuning (SFT), these models face twofundamental challenges: (i) the scarcity and high cost of large-scalehuman-operated robotic trajectories required for SFT scaling, and (ii) limitedgeneralization to tasks involving distribution shift. Recent breakthroughs inLarge Reasoning Models (LRMs) demonstrate that reinforcement learning (RL) candramatically enhance step-by-step reasoning capabilities, raising a naturalquestion: Can RL similarly improve the long-horizon step-by-step actionplanning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RLframework tailored for VLA models. Building upon veRL, we introduceVLA-specific trajectory sampling, scalable parallelization, multi-environmentrendering, and optimized loss computation. When applied to OpenVLA-OFT,SimpleVLA-RL achieves SoTA performance on LIBERO and even outperforms pi_0on RoboTwin 1.0\&2.0 with the exploration-enhancing strategies we introduce.SimpleVLA-RL not only reduces dependence on large-scale data and enables robustgeneralization, but also remarkably surpasses SFT in real-world tasks.Moreover, we identify a novel phenomenon ``pushcut'' during RL training,wherein the policy discovers previously unseen patterns beyond those seen inthe previous training process. Github: https://github.com/PRIME-RL/SimpleVLA-RL

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Haozhan Li Yuxin Zuo Jiale Yu Yuhao Zhang Zhaohui Yang Kaiyan Zhang Xuekai Zhu Yuchen Zhang Tianxing Chen Ganqu Cui11 more

Abstract

Build AI with AI

Hyper Newsletters

Haozhan Li Yuxin Zuo Jiale Yu Yuhao Zhang Zhaohui Yang Kaiyan Zhang Xuekai Zhu Yuchen Zhang Tianxing Chen Ganqu Cui