HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Fangfu Liu Hao Li Jiawei Chi Hanyang Wang Minghui Yang Fudong Wang Yueqi Duan

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with
  TriMap Video Diffusion

Abstract

Recovering 3D structures with open-vocabulary scene understanding from 2Dimages is a fundamental but daunting task. Recent developments have achievedthis by performing per-scene optimization with embedded language information.However, they heavily rely on the calibrated dense-view reconstructionparadigm, thereby suffering from severe rendering artifacts and implausiblesemantic synthesis when limited views are available. In this paper, weintroduce a novel generative framework, coined LangScene-X, to unify andgenerate 3D consistent multi-modality information for reconstruction andunderstanding. Powered by the generative capability of creating more consistentnovel observations, we can build generalizable 3D language-embedded scenes fromonly sparse views. Specifically, we first train a TriMap video diffusion modelthat can generate appearance (RGBs), geometry (normals), and semantics(segmentation maps) from sparse inputs through progressive knowledgeintegration. Furthermore, we propose a Language Quantized Compressor (LQC),trained on large-scale image datasets, to efficiently encode languageembeddings, enabling cross-scene generalization without per-scene retraining.Finally, we reconstruct the language surface fields by aligning languageinformation onto the surface of 3D scenes, enabling open-ended languagequeries. Extensive experiments on real-world data demonstrate the superiorityof our LangScene-X over state-of-the-art methods in terms of quality andgeneralizability. Project Page: https://liuff19.github.io/LangScene-X.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion | Papers | HyperAI