Command Palette
Search for a command to run...
Angela Dai; Angel X. Chang; Manolis Savva; Maciej Halber; Thomas Funkhouser; Matthias Nießner

Abstract
A key requirement for leveraging supervised deep learning methods is the availability of large, labeled datasets. Unfortunately, in the context of RGB-D scene understanding, very little data is available -- current datasets cover a small range of scene views and have limited semantic annotations. To address this issue, we introduce ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations. To collect this data, we designed an easy-to-use and scalable RGB-D capture system that includes automated surface reconstruction and crowdsourced semantic annotation. We show that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks, including 3D object classification, semantic voxel labeling, and CAD model retrieval. The dataset is freely available at http://www.scan-net.org.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| semantic-segmentation-on-scannet | ScanNet | test mIoU: 30.6 |
| semantic-segmentation-on-scannetv2 | ScanNet (2d proj) | Mean IoU: 33.0% |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.