3 months ago

Self-Supervised Monocular Scene Flow Estimation

Junhwa Hur Stefan Roth

Abstract

Scene flow estimation has been receiving increasing attention for 3D environment perception. Monocular scene flow estimation -- obtaining 3D structure and 3D motion from two temporally consecutive images -- is a highly ill-posed problem, and practical solutions are lacking to date. We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance. By taking an inverse problem view, we design a single convolutional neural network (CNN) that successfully estimates depth and 3D motion simultaneously from a classical optical flow cost volume. We adopt self-supervised learning with 3D loss functions and occlusion reasoning to leverage unlabeled data. We validate our design choices, including the proxy loss and augmentation setup. Our model achieves state-of-the-art accuracy among unsupervised/self-supervised learning approaches to monocular scene flow, and yields competitive results for the optical flow and monocular depth estimation sub-tasks. Semi-supervised fine-tuning further improves the accuracy and yields promising results in real-time.

Code Repositories

visinf/self-mono-sf

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
scene-flow-estimation-on-kitti-2015-scene	Self-Mono-SF	Runtime (s): 0.09 D1-all: 31.25 D2-all: 34.86 Fl-all: 23.49 SF-all: 47.05
scene-flow-estimation-on-kitti-2015-scene-1	Self-Mono-SF	D1-all: 34.02 D2-all: 36.34 Fl-all: 23.54 Runtime (s): 0.09 SF-all: 49.54

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette