5 months ago

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

Choy Christopher ; Gwak JunYoung ; Savarese Silvio

Abstract

In many robotics and VR/AR applications, 3D-videos are readily-availablesources of input (a continuous sequence of depth images, or LIDAR scans).However, those 3D-videos are processed frame-by-frame either through 2Dconvnets or 3D perception algorithms. In this work, we propose 4-dimensionalconvolutional neural networks for spatio-temporal perception that can directlyprocess such 3D-videos using high-dimensional convolutions. For this, we adoptsparse tensors and propose the generalized sparse convolution that encompassesall discrete convolutions. To implement the generalized sparse convolution, wecreate an open-source auto-differentiation library for sparse tensors thatprovides extensive functions for high-dimensional convolutional neuralnetworks. We create 4D spatio-temporal convolutional neural networks using thelibrary and validate them on various 3D semantic segmentation benchmarks andproposed 4D datasets for 3D-video perception. To overcome challenges in the 4Dspace, we propose the hybrid kernel, a special case of the generalized sparseconvolution, and the trilateral-stationary conditional random field thatenforces spatio-temporal consistency in the 7D space-time-chroma space.Experimentally, we show that convolutional neural networks with onlygeneralized 3D sparse convolutions can outperform 2D or 2D-3D hybrid methods bya large margin. Also, we show that on 3D-videos, 4D spatio-temporalconvolutional neural networks are robust to noise, outperform 3D convolutionalneural networks and are faster than the 3D counterpart in some cases.

Code Repositories

NVIDIA/MinkowskiEngine

pytorch

Mentioned in GitHub

shwoo93/minkowskiengine

pytorch

Mentioned in GitHub

buildingnet/buildingnet_dataset

pytorch

Mentioned in GitHub

ldkong1205/Robo3D

pytorch

Mentioned in GitHub

dkoh0207/lartpc_minkowski

pytorch

Mentioned in GitHub

StanfordVL/MinkowskiEngine

Official

pytorch

Mentioned in GitHub

Pointcept/Pointcept

pytorch

mit-han-lab/spvnas

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
3d-semantic-segmentation-on-scannet-1	MinkowskiNet	Top-1 IoU: 0.292 Top-3 IoU: 0.531
3d-semantic-segmentation-on-scannet200	MinkUNet	test mIoU: 25.3 val mIoU: 25.0
3d-semantic-segmentation-on-scribblekitti	MinkowskiNet	mIoU: 55.0
3d-semantic-segmentation-on-stpls3d	MinkowskiNet	mIOU: 51.3
robust-3d-semantic-segmentation-on	MinkUNet-18	mean Corruption Error (mCE): 100.00%
robust-3d-semantic-segmentation-on	MinkUNet-34	mean Corruption Error (mCE): 100.61%
robust-3d-semantic-segmentation-on-nuscenes-c	MinkUNet-34	mean Corruption Error (mCE): 96.37%
robust-3d-semantic-segmentation-on-nuscenes-c	MinkUNet-18	mean Corruption Error (mCE): 100.00%
robust-3d-semantic-segmentation-on-wod-c	MinkUNet-18	mean Corruption Error (mCE): 100.00%
robust-3d-semantic-segmentation-on-wod-c	MinkUNet-34	mean Corruption Error (mCE): 96.21%
semantic-segmentation-on-s3dis	MinkowskiNet	Mean IoU: 65.4 Number of params: 37.9M Params (M): 37.9
semantic-segmentation-on-s3dis-area5	MinkowskiNet	Number of params: 37.9M mAcc: 71.7 mIoU: 65.4
semantic-segmentation-on-scannet	MinkowskiNet	test mIoU: 73.4 val mIoU: 72.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

Choy Christopher ; Gwak JunYoung ; Savarese Silvio

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters