HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Abstract

Recent progress in vision and language foundation models has significantlyadvanced multimodal understanding, reasoning, and generation, inspiring a surgeof interest in extending such capabilities to embodied settings throughvision-language-action (VLA) models. Yet, most VLA models are still trainedwith supervised fine-tuning (SFT), which struggles to generalize underdistribution shifts due to error accumulation. Reinforcement learning (RL)offers a promising alternative by directly optimizing task performance throughinteraction, but existing attempts remain fragmented and lack a unifiedplatform for fair and systematic comparison across model architectures andalgorithmic designs. To address this gap, we introduce RLinf-VLA, a unified andefficient framework for scalable RL training of VLA models. The system adopts ahighly flexible resource allocation design that addresses the challenge ofintegrating rendering, training, and inference in RL+VLA training. Inparticular, for GPU-parallelized simulators, RLinf-VLA implements a novelhybrid fine-grained pipeline allocation mode, achieving a 1.61x-1.88x speedupin training. Through a unified interface, RLinf-VLA seamlessly supports diverseVLA architectures (e.g., OpenVLA, OpenVLA-OFT), multiple RL algorithms (e.g.,PPO, GRPO), and various simulators (e.g., ManiSkill, LIBERO). In simulation, aunified model achieves 98.11\% across 130 LIBERO tasks and 97.66\% across 25ManiSkill tasks. Beyond empirical performance, our study distills a set of bestpractices for applying RL to VLA training and sheds light on emerging patternsin this integration. Furthermore, we present preliminary deployment on areal-world Franka robot, where RL-trained policies exhibit strongergeneralization than those trained with SFT. We envision RLinf-VLA as afoundation to accelerate and standardize research on embodied intelligence.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training | Papers | HyperAI