a month ago

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Hengtao Li Pengxiang Ding Runze Suo Yihao Wang Zirui Ge Dongyuan Zang Kexian Yu Mingyang Sun Hongyin Zhang Donglin Wang

Abstract

Vision-Language-Action (VLA) models enable embodied decision-making but relyheavily on imitation learning, leading to compounding errors and poorrobustness under distribution shift. Reinforcement learning (RL) can mitigatethese issues yet typically demands costly real-world interactions or suffersfrom sim-to-real gaps. We introduce VLA-RFT, a reinforcement fine-tuningframework that leverages a data-driven world model as a controllable simulator.Trained from real interaction data, the simulator predicts future visualobservations conditioned on actions, allowing policy rollouts with dense,trajectory-level rewards derived from goal-achieving references. This designdelivers an efficient and action-aligned learning signal, drastically loweringsample requirements. With fewer than 400 fine-tuning steps, VLA-RFT surpassesstrong supervised baselines and achieves greater efficiency thansimulator-based RL. Moreover, it exhibits strong robustness under perturbedconditions, sustaining stable task execution. Our results establishworld-model-based RFT as a practical post-training paradigm to enhance thegeneralization and robustness of VLA models. For more details, please refer tohttps://vla-rft.github.io/.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Hengtao Li Pengxiang Ding Runze Suo Yihao Wang Zirui Ge Dongyuan Zang Kexian Yu Mingyang Sun Hongyin Zhang Donglin Wang1 more

Abstract

Build AI with AI

Hyper Newsletters

Hengtao Li Pengxiang Ding Runze Suo Yihao Wang Zirui Ge Dongyuan Zang Kexian Yu Mingyang Sun Hongyin Zhang Donglin Wang