Command Palette
Search for a command to run...
Zheng Duo ; Huang Shijia ; Zhao Lin ; Zhong Yiwu ; Wang Liwei

Abstract
Building a generalist agent that can interact with the world is theintriguing target of AI systems, thus spurring the research for embodiednavigation, where an agent is required to navigate according to instructions orrespond to queries. Despite the major progress attained, previous worksprimarily focus on task-specific agents and lack generalizability to unseenscenarios. Recently, LLMs have presented remarkable capabilities across variousfields, and provided a promising opportunity for embodied navigation. Drawingon this, we propose the first generalist model for embodied navigation,NaviLLM. It adapts LLMs to embodied navigation by introducing schema-basedinstruction. The schema-based instruction flexibly casts various tasks intogeneration problems, thereby unifying a wide range of tasks. This approachallows us to integrate diverse data sources from various datasets into thetraining, equipping NaviLLM with a wide range of capabilities required byembodied navigation. We conduct extensive experiments to evaluate theperformance and generalizability of our model. The experimental resultsdemonstrate that our unified model achieves state-of-the-art performance onCVDN, SOON, and ScanQA. Specifically, it surpasses the previousstats-of-the-art method by a significant margin of 29% in goal progress onCVDN. Moreover, our model also demonstrates strong generalizability andpresents impressive results on unseen tasks, e.g., embodied question answeringand 3D captioning.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-question-answering-3d-qa-on-scanqa-test-w | NaviLLM | BLEU-1: 39.73 BLEU-4: 13.90 CIDEr: 80.77 Exact Match: 26.27 METEOR: 16.56 ROUGE: 40.23 |
| visual-navigation-on-cooperative-vision-and-1 | NaviLLM | dist_to_end_reduction: 7.90 spl: 0.09 |
| visual-navigation-on-room-to-room-1 | NaviLLM | spl: 0.60 |
| visual-navigation-on-soon-test | NaviLLM | Nav-SPL: 26.26 SR: 35.04 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.