7 months ago

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang

Abstract

Recent advances in vision-language-action (VLA) models have shown promise inintegrating image generation with action prediction to improve generalizationand reasoning in robot manipulation. However, existing methods are limited tochallenging image-based forecasting, which suffers from redundant informationand lacks comprehensive and critical world knowledge, including dynamic,spatial and semantic information. To address these limitations, we proposeDreamVLA, a novel VLA framework that integrates comprehensive world knowledgeforecasting to enable inverse dynamics modeling, thereby establishing aperception-prediction-action loop for manipulation tasks. Specifically,DreamVLA introduces a dynamic-region-guided world knowledge prediction,integrated with the spatial and semantic cues, which provide compact yetcomprehensive representations for action planning. This design aligns with howhumans interact with the world by first forming abstract multimodal reasoningchains before acting. To mitigate interference among the dynamic, spatial andsemantic information during training, we adopt a block-wise structuredattention mechanism that masks their mutual attention, preventing informationleakage and keeping each representation clean and disentangled. Moreover, tomodel the conditional distribution over future actions, we employ adiffusion-based transformer that disentangles action representations fromshared latent features. Extensive experiments on both real-world and simulationenvironments demonstrate that DreamVLA achieves 76.7% success rate on realrobot tasks and 4.44 average length on the CALVIN ABC-D benchmarks.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Any-to-Any

Robotics

Embodied Intelligence

Research Field

Multimodality

Task/Problem

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

7 months ago

Any-to-Any

Robotics

Embodied Intelligence

Research Field

Multimodality

Task/Problem

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang3 more

Abstract

Build AI with AI

HyperAI Newsletters

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang

Wenyao Zhang Hongsi Liu Zekun Qi Yunnan Wang XinQiang Yu Jiazhao Zhang Runpei Dong Jiawei He He Wang Zhizheng Zhang