4 months ago

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

Meng Wei Chenyang Wan Xiqian Yu Tai Wang Yuqiang Yang Xiaohan Mao Chenming Zhu Wenzhe Cai Hanqing Wang Yilun Chen

Abstract

Vision-and-Language Navigation (VLN) in real-world settings requires agentsto process continuous visual streams and generate actions with low latencygrounded in language instructions. While Video-based Large Language Models(Video-LLMs) have driven recent progress, current VLN methods based onVideo-LLM often face trade-offs among fine-grained visual understanding,long-term context modeling and computational efficiency. We introduceStreamVLN, a streaming VLN framework that employs a hybrid slow-fast contextmodeling strategy to support multi-modal reasoning over interleaved vision,language and action inputs. The fast-streaming dialogue context facilitatesresponsive action generation through a sliding-window of active dialogues,while the slow-updating memory context compresses historical visual statesusing a 3D-aware token pruning strategy. With this slow-fast design, StreamVLNachieves coherent multi-turn dialogue through efficient KV cache reuse,supporting long video streams with bounded context size and inference cost.Experiments on VLN-CE benchmarks demonstrate state-of-the-art performance withstable low latency, ensuring robustness and efficiency in real-worlddeployment. The project page is:https://streamvln.github.io/{https://streamvln.github.io/}.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

Meng Wei Chenyang Wan Xiqian Yu Tai Wang Yuqiang Yang Xiaohan Mao Chenming Zhu Wenzhe Cai Hanqing Wang Yilun Chen2 more

Abstract

Build AI with AI

Hyper Newsletters

Meng Wei Chenyang Wan Xiqian Yu Tai Wang Yuqiang Yang Xiaohan Mao Chenming Zhu Wenzhe Cai Hanqing Wang Yilun Chen