7 months ago

3D Machine Vision

Depth Estimation

Computer Vision

Zhengqing Wang Yuefan Wu Jiacheng Chen Fuyang Zhang Yasutaka Furukawa

Abstract

This paper proposes a neural rendering approach that represents a scene as"compressed light-field tokens (CLiFTs)", retaining rich appearance andgeometric information of a scene. CLiFT enables compute-efficient rendering bycompressed tokens, while being capable of changing the number of tokens torepresent a scene or render a novel view with one trained network. Concretely,given a set of images, multi-view encoder tokenizes the images with the cameraposes. Latent-space K-means selects a reduced set of rays as cluster centroidsusing the tokens. The multi-view ``condenser'' compresses the information ofall the tokens into the centroid tokens to construct CLiFTs. At test time,given a target view and a compute budget (i.e., the number of CLiFTs), thesystem collects the specified number of nearby tokens and synthesizes a novelview using a compute-adaptive renderer. Extensive experiments on RealEstate10Kand DL3DV datasets quantitatively and qualitatively validate our approach,achieving significant data reduction with comparable rendering quality and thehighest overall rendering score, while providing trade-offs of data size,rendering quality, and rendering speed.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

7 months ago

3D Machine Vision

Depth Estimation

Computer Vision

Zhengqing Wang Yuefan Wu Jiacheng Chen Fuyang Zhang Yasutaka Furukawa

Abstract

This paper proposes a neural rendering approach that represents a scene as"compressed light-field tokens (CLiFTs)", retaining rich appearance andgeometric information of a scene. CLiFT enables compute-efficient rendering bycompressed tokens, while being capable of changing the number of tokens torepresent a scene or render a novel view with one trained network. Concretely,given a set of images, multi-view encoder tokenizes the images with the cameraposes. Latent-space K-means selects a reduced set of rays as cluster centroidsusing the tokens. The multi-view ``condenser'' compresses the information ofall the tokens into the centroid tokens to construct CLiFTs. At test time,given a target view and a compute budget (i.e., the number of CLiFTs), thesystem collects the specified number of nearby tokens and synthesizes a novelview using a compute-adaptive renderer. Extensive experiments on RealEstate10Kand DL3DV datasets quantitatively and qualitatively validate our approach,achieving significant data reduction with comparable rendering quality and thehighest overall rendering score, while providing trade-offs of data size,rendering quality, and rendering speed.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering | Papers | HyperAI