Command Palette
Search for a command to run...

Abstract
Imitation-learning-based visuomotor policies have been widely used in robotmanipulation, where both visual observations and proprioceptive states aretypically adopted together for precise control. However, in this study, we findthat this common practice makes the policy overly reliant on the proprioceptivestate input, which causes overfitting to the training trajectories and resultsin poor spatial generalization. On the contrary, we propose the State-freePolicy, removing the proprioceptive state input and predicting actions onlyconditioned on visual observations. The State-free Policy is built in therelative end-effector action space, and should ensure the full task-relevantvisual observations, here provided by dual wide-angle wrist cameras. Empiricalresults demonstrate that the State-free policy achieves significantly strongerspatial generalization than the state-based policy: in real-world tasks such aspick-and-place, challenging shirt-folding, and complex whole-body manipulation,spanning multiple robot embodiments, the average success rate improves from 0\%to 85\% in height generalization and from 6\% to 64\% in horizontalgeneralization. Furthermore, they also show advantages in data efficiency andcross-embodiment adaptation, enhancing their practicality for real-worlddeployment.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.