6 months ago

3D Machine Vision

Depth Estimation

Computer Vision

Yushi Lan Yihang Luo Fangzhou Hong Shangchen Zhou Honghua Chen Zhaoyang Lyu Shuai Yang Bo Dai Chen Change Loy Xingang Pan

Abstract

We present STream3R, a novel approach to 3D reconstruction that reformulatespointmap prediction as a decoder-only Transformer problem. Existingstate-of-the-art methods for multi-view reconstruction either depend onexpensive global optimization or rely on simplistic memory mechanisms thatscale poorly with sequence length. In contrast, STream3R introduces anstreaming framework that processes image sequences efficiently using causalattention, inspired by advances in modern language modeling. By learninggeometric priors from large-scale 3D datasets, STream3R generalizes well todiverse and challenging scenarios, including dynamic scenes where traditionalmethods often fail. Extensive experiments show that our method consistentlyoutperforms prior work across both static and dynamic scene benchmarks.Moreover, STream3R is inherently compatible with LLM-style traininginfrastructure, enabling efficient large-scale pretraining and fine-tuning forvarious downstream 3D tasks. Our results underscore the potential of causalTransformer models for online 3D perception, paving the way for real-time 3Dunderstanding in streaming environments. More details can be found in ourproject page: https://nirvanalan.github.io/projects/stream3r.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

6 months ago

3D Machine Vision

Depth Estimation

Computer Vision

Yushi Lan Yihang Luo Fangzhou Hong Shangchen Zhou Honghua Chen Zhaoyang Lyu Shuai Yang Bo Dai Chen Change Loy Xingang Pan

Abstract

We present STream3R, a novel approach to 3D reconstruction that reformulatespointmap prediction as a decoder-only Transformer problem. Existingstate-of-the-art methods for multi-view reconstruction either depend onexpensive global optimization or rely on simplistic memory mechanisms thatscale poorly with sequence length. In contrast, STream3R introduces anstreaming framework that processes image sequences efficiently using causalattention, inspired by advances in modern language modeling. By learninggeometric priors from large-scale 3D datasets, STream3R generalizes well todiverse and challenging scenarios, including dynamic scenes where traditionalmethods often fail. Extensive experiments show that our method consistentlyoutperforms prior work across both static and dynamic scene benchmarks.Moreover, STream3R is inherently compatible with LLM-style traininginfrastructure, enabling efficient large-scale pretraining and fine-tuning forvarious downstream 3D tasks. Our results underscore the potential of causalTransformer models for online 3D perception, paving the way for real-time 3Dunderstanding in streaming environments. More details can be found in ourproject page: https://nirvanalan.github.io/projects/stream3r.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp