HyperAIHyperAI

Command Palette

Search for a command to run...

12 days ago

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Abstract

Training Vision-Language-Action (VLA) models for generalist robots typicallyrequires large-scale real-world robot data, which is expensive andtime-consuming to collect. The inefficiency of physical data collectionseverely limits the scalability, and generalization capacity of current VLAsystems. To address this challenge, we introduce GigaBrain-0, a novel VLAfoundation model empowered by world model-generated data (e.g., videogeneration, real2real transfer, human transfer, view transfer, sim2realtransfer data). By leveraging world models to generate diverse data at scale,GigaBrain-0 significantly reduces reliance on real robot data while improvingcross-task generalization. Our approach further improves policy robustnessthrough RGBD input modeling and embodied Chain-of-Thought (CoT) supervision,enabling the model to reason about spatial geometry, object states, andlong-horizon dependencies during task execution. This leads to substantialgains in real-world performance on dexterous, long-horizon, and mobilemanipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achievessuperior generalization across variations in appearances (e.g., textures,colors), object placements, and camera viewpoints. Additionally, we presentGigaBrain-0-Small, an optimized lightweight variant designed to run efficientlyon devices such as the NVIDIA Jetson AGX Orin.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GigaBrain-0: A World Model-Powered Vision-Language-Action Model | Papers | HyperAI