4 months ago

Semantic Segmentation

3D Machine Vision

Computer Vision

Dai Angela Nie&#xdf ner Matthias

Abstract

We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-Dscans in indoor environments using a joint 3D-multi-view prediction network. Incontrast to existing methods that either use geometry or RGB data as input forthis task, we combine both data modalities in a joint, end-to-end networkarchitecture. Rather than simply projecting color data into a volumetric gridand operating solely in 3D -- which would result in insufficient detail -- wefirst extract feature maps from associated RGB images. These features are thenmapped into the volumetric feature grid of a 3D network using a differentiablebackprojection layer. Since our target is 3D scanning scenarios with possiblymany frames, we use a multi-view pooling approach in order to handle a varyingnumber of RGB input views. This learned combination of RGB and geometricfeatures with our joint 2D-3D architecture achieves significantly betterresults than existing baselines. For instance, our final result on the ScanNet3D segmentation benchmark increases from 52.8% to 75% accuracy compared toexisting volumetric architectures.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

4 months ago

Semantic Segmentation

3D Machine Vision

Computer Vision

Dai Angela Nie&#xdf ner Matthias

Abstract

We present 3DMV, a novel method for 3D semantic scene segmentation of RGB-Dscans in indoor environments using a joint 3D-multi-view prediction network. Incontrast to existing methods that either use geometry or RGB data as input forthis task, we combine both data modalities in a joint, end-to-end networkarchitecture. Rather than simply projecting color data into a volumetric gridand operating solely in 3D -- which would result in insufficient detail -- wefirst extract feature maps from associated RGB images. These features are thenmapped into the volumetric feature grid of a 3D network using a differentiablebackprojection layer. Since our target is 3D scanning scenarios with possiblymany frames, we use a multi-view pooling approach in order to handle a varyingnumber of RGB input views. This learned combination of RGB and geometricfeatures with our joint 2D-3D architecture achieves significantly betterresults than existing baselines. For instance, our final result on the ScanNet3D segmentation benchmark increases from 52.8% to 75% accuracy compared toexisting volumetric architectures.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp