4 months ago

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu

Abstract

Recent progress in vision and language foundation models has significantlyadvanced multimodal understanding, reasoning, and generation, inspiring a surgeof interest in extending such capabilities to embodied settings throughvision-language-action (VLA) models. Yet, most VLA models are still trainedwith supervised fine-tuning (SFT), which struggles to generalize underdistribution shifts due to error accumulation. Reinforcement learning (RL)offers a promising alternative by directly optimizing task performance throughinteraction, but existing attempts remain fragmented and lack a unifiedplatform for fair and systematic comparison across model architectures andalgorithmic designs. To address this gap, we introduce RLinf-VLA, a unified andefficient framework for scalable RL training of VLA models. The system adopts ahighly flexible resource allocation design that addresses the challenge ofintegrating rendering, training, and inference in RL+VLA training. Inparticular, for GPU-parallelized simulators, RLinf-VLA implements a novelhybrid fine-grained pipeline allocation mode, achieving a 1.61x-1.88x speedupin training. Through a unified interface, RLinf-VLA seamlessly supports diverseVLA architectures (e.g., OpenVLA, OpenVLA-OFT), multiple RL algorithms (e.g.,PPO, GRPO), and various simulators (e.g., ManiSkill, LIBERO). In simulation, aunified model achieves 98.11% across 130 LIBERO tasks and 97.66% across 25ManiSkill tasks. Beyond empirical performance, our study distills a set of bestpractices for applying RL to VLA training and sheds light on emerging patternsin this integration. Furthermore, we present preliminary deployment on areal-world Franka robot, where RL-trained policies exhibit strongergeneralization than those trained with SFT. We envision RLinf-VLA as afoundation to accelerate and standardize research on embodied intelligence.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 months ago

Reinforcement Learning

Multimodal Representation

Supervised Fine-Tuning

Method/Architecture

Multimodality

Task/Problem

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

4 months ago

Reinforcement Learning

Multimodal Representation

Supervised Fine-Tuning

Method/Architecture

Multimodality

Task/Problem

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu7 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu7 more

Abstract

Build AI with AI

HyperAI Newsletters

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu

Hongzhi Zang Mingjie Wei Si Xu Yongji Wu Zhen Guo Yuanqing Wang Hao Lin Liangzhi Shi Yuqing Xie Zhexuan Xu