Command Palette
Search for a command to run...
Choy Christopher ; Gwak JunYoung ; Savarese Silvio

Abstract
In many robotics and VR/AR applications, 3D-videos are readily-availablesources of input (a continuous sequence of depth images, or LIDAR scans).However, those 3D-videos are processed frame-by-frame either through 2Dconvnets or 3D perception algorithms. In this work, we propose 4-dimensionalconvolutional neural networks for spatio-temporal perception that can directlyprocess such 3D-videos using high-dimensional convolutions. For this, we adoptsparse tensors and propose the generalized sparse convolution that encompassesall discrete convolutions. To implement the generalized sparse convolution, wecreate an open-source auto-differentiation library for sparse tensors thatprovides extensive functions for high-dimensional convolutional neuralnetworks. We create 4D spatio-temporal convolutional neural networks using thelibrary and validate them on various 3D semantic segmentation benchmarks andproposed 4D datasets for 3D-video perception. To overcome challenges in the 4Dspace, we propose the hybrid kernel, a special case of the generalized sparseconvolution, and the trilateral-stationary conditional random field thatenforces spatio-temporal consistency in the 7D space-time-chroma space.Experimentally, we show that convolutional neural networks with onlygeneralized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods bya large margin. Also, we show that on 3D-videos, 4D spatio-temporalconvolutional neural networks are robust to noise, outperform 3D convolutionalneural networks and are faster than the 3D counterpart in some cases.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-semantic-segmentation-on-scannet-1 | MinkowskiNet | Top-1 IoU: 0.292 Top-3 IoU: 0.531 |
| 3d-semantic-segmentation-on-scannet200 | MinkUNet | test mIoU: 25.3 val mIoU: 25.0 |
| 3d-semantic-segmentation-on-scribblekitti | MinkowskiNet | mIoU: 55.0 |
| 3d-semantic-segmentation-on-stpls3d | MinkowskiNet | mIOU: 51.3 |
| robust-3d-semantic-segmentation-on | MinkUNet-18 | mean Corruption Error (mCE): 100.00% |
| robust-3d-semantic-segmentation-on | MinkUNet-34 | mean Corruption Error (mCE): 100.61% |
| robust-3d-semantic-segmentation-on-nuscenes-c | MinkUNet-34 | mean Corruption Error (mCE): 96.37% |
| robust-3d-semantic-segmentation-on-nuscenes-c | MinkUNet-18 | mean Corruption Error (mCE): 100.00% |
| robust-3d-semantic-segmentation-on-wod-c | MinkUNet-18 | mean Corruption Error (mCE): 100.00% |
| robust-3d-semantic-segmentation-on-wod-c | MinkUNet-34 | mean Corruption Error (mCE): 96.21% |
| semantic-segmentation-on-s3dis | MinkowskiNet | Mean IoU: 65.4 Number of params: 37.9M Params (M): 37.9 |
| semantic-segmentation-on-s3dis-area5 | MinkowskiNet | Number of params: 37.9M mAcc: 71.7 mIoU: 65.4 |
| semantic-segmentation-on-scannet | MinkowskiNet | test mIoU: 73.4 val mIoU: 72.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.