5 months ago

Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks

Lu Chenyang ; van de Molengraft Marinus Jacobus Gerardus ; Dubbelman Gijs

Abstract

In this work, we research and evaluate end-to-end learning of monocularsemantic-metric occupancy grid mapping from weak binocular ground truth. Thenetwork learns to predict four classes, as well as a camera to bird's eye viewmapping. At the core, it utilizes a variational encoder-decoder network thatencodes the front-view visual information of the driving scene and subsequentlydecodes it into a 2-D top-view Cartesian coordinate system. The evaluations onCityscapes show that the end-to-end learning of semantic-metric occupancy gridsoutperforms the deterministic mapping approach with flat-plane assumption bymore than 12% mean IoU. Furthermore, we show that the variational sampling witha relatively small embedding vector brings robustness against vehicle dynamicperturbations, and generalizability for unseen KITTI data. Our network achievesreal-time inference rates of approx. 35 Hz for an input image with a resolutionof 256x512 pixels and an output map with 64x64 occupancy grid cells using aTitan V GPU.

Benchmarks

Benchmark	Methodology	Metrics
bird-s-eye-view-semantic-segmentation-on	VED	IoU veh - 224x480 - No vis filter - 100x50 at 0.25: 8.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning