Robot Manipulation On Simpler Env
评估指标
Variant Aggregation
Variant Aggregation-Move Near
Variant Aggregation-Open/Close Drawer
Variant Aggregation-Pick Coke Can
Visual Matching
Visual Matching-Move Near
Visual Matching-Open/Close Drawer
Visual Matching-Pick Coke Can
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SpatialVLA | 0.688 | 0.717 | 0.362 | 0.895 | 0.719 | 0.696 | 0.593 | 0.810 | SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model | - | 
| SoFar | 0.676 | 0.740 | 0.297 | 0.907 | 0.749 | 0.917 | 0.403 | 0.923 | SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | |
| RT-2-X | 0.661 | 0.792 | 0.353 | 0.823 | 0.606 | 0.779 | 0.250 | 0.787 | RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | |
| RoboVLM | 0.463 | 0.560 | 0.085 | 0.683 | 0.563 | 0.663 | 0.268 | 0.727 | Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models | |
| TraceVLA | 0.450 | 0.564 | 0.310 | 0.600 | 0.460 | 0.600 | 0.240 | 0.560 | TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies | - | 
| OpenVLA | 0.411 | 0.477 | 0.177 | 0.545 | 0.277 | 0.462 | 0.356 | 0.163 | OpenVLA: An Open-Source Vision-Language-Action Model | |
| RT-1-X | 0.397 | 0.323 | 0.294 | 0.490 | 0.534 | 0.317 | 0.597 | 0.567 | RT-1: Robotics Transformer for Real-World Control at Scale | |
| Octo-Base | 0.012 | 0.031 | 0.011 | 0.006 | 0.168 | 0.042 | 0.227 | 0.170 | Octo: An Open-Source Generalist Robot Policy | - | 
0 of 8 row(s) selected.