Command Palette
Search for a command to run...

Abstract
Large policies pretrained on diverse robot datasets have the potential totransform robotic learning: instead of training new policies from scratch, suchgeneralist robot policies may be finetuned with only a little in-domain data,yet generalize broadly. However, to be widely applicable across a range ofrobotic learning scenarios, environments, and tasks, such policies need tohandle diverse sensors and action spaces, accommodate a variety of commonlyused robotic platforms, and finetune readily and efficiently to new domains. Inthis work, we aim to lay the groundwork for developing open-source, widelyapplicable, generalist policies for robotic manipulation. As a first step, weintroduce Octo, a large transformer-based policy trained on 800k trajectoriesfrom the Open X-Embodiment dataset, the largest robot manipulation dataset todate. It can be instructed via language commands or goal images and can beeffectively finetuned to robot setups with new sensory inputs and action spaceswithin a few hours on standard consumer GPUs. In experiments across 9 roboticplatforms, we demonstrate that Octo serves as a versatile policy initializationthat can be effectively finetuned to new observation and action spaces. We alsoperform detailed ablations of design decisions for the Octo model, fromarchitecture to training data, to guide future research on building generalistrobot models.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| robot-manipulation-on-simpler-env | Octo-Base | Variant Aggregation: 0.012 Variant Aggregation-Move Near: 0.031 Variant Aggregation-Open/Close Drawer: 0.011 Variant Aggregation-Pick Coke Can: 0.006 Visual Matching: 0.168 Visual Matching-Move Near: 0.042 Visual Matching-Open/Close Drawer: 0.227 Visual Matching-Pick Coke Can: 0.170 |
| robot-manipulation-on-simplerenv-widow-x | Octo-Small | Average: 0.300 Put Carrot on Plate: 0.097 Put Spoon on Towel: 0.472 Stack Green Block on Yellow Block: 0.042 |
| robot-manipulation-on-simplerenv-widow-x | Octo-Base | Average: 0.160 Put Carrot on Plate: 0.083 Put Spoon on Towel: 0.125 Stack Green Block on Yellow Block: 0.000 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.