8 months ago

Ruijie Zheng Yongyuan Liang Shuaiyi Huang Jianfeng Gao Hal Daumé III Andrey Kolobov Furong Huang Jianwei Yang

Abstract

Although large vision-language-action (VLA) models pretrained on extensiverobot datasets offer promising generalist policies for robotic learning, theystill struggle with spatial-temporal dynamics in interactive robotics, makingthem less effective in handling complex tasks, such as manipulation. In thiswork, we introduce visual trace prompting, a simple yet effective approach tofacilitate VLA models' spatial-temporal awareness for action prediction byencoding state-action trajectories visually. We develop a new TraceVLA model byfinetuning OpenVLA on our own collected dataset of 150K robot manipulationtrajectories using visual trace prompting. Evaluations of TraceVLA across 137configurations in SimplerEnv and 4 tasks on a physical WidowX robot demonstratestate-of-the-art performance, outperforming OpenVLA by 10% on SimplerEnv and3.5x on real-robot tasks and exhibiting robust generalization across diverseembodiments and scenarios. To further validate the effectiveness and generalityof our method, we present a compact VLA model based on 4B Phi-3-Vision,pretrained on the Open-X-Embodiment and finetuned on our dataset, rivals the 7BOpenVLA baseline while significantly improving inference efficiency.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Robotics

Embodied Intelligence

Ruijie Zheng Yongyuan Liang Shuaiyi Huang Jianfeng Gao Hal Daumé III Andrey Kolobov Furong Huang Jianwei Yang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

8 months ago

Robotics

Embodied Intelligence

Ruijie Zheng Yongyuan Liang Shuaiyi Huang Jianfeng Gao Hal Daumé III Andrey Kolobov Furong Huang Jianwei Yang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | Papers | HyperAI

Command Palette

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng Yongyuan Liang Shuaiyi Huang Jianfeng Gao Hal Daumé III Andrey Kolobov Furong Huang Jianwei Yang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng Yongyuan Liang Shuaiyi Huang Jianfeng Gao Hal Daumé III Andrey Kolobov Furong Huang Jianwei Yang

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies

Ruijie Zheng Yongyuan Liang Shuaiyi Huang Jianfeng Gao Hal Daumé III Andrey Kolobov Furong Huang Jianwei Yang

Abstract

Build AI with AI

HyperAI Newsletters