Command Palette
Search for a command to run...
Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks
Lu Chenyang ; van de Molengraft Marinus Jacobus Gerardus ; Dubbelman Gijs

Abstract
In this work, we research and evaluate end-to-end learning of monocularsemantic-metric occupancy grid mapping from weak binocular ground truth. Thenetwork learns to predict four classes, as well as a camera to bird's eye viewmapping. At the core, it utilizes a variational encoder-decoder network thatencodes the front-view visual information of the driving scene and subsequentlydecodes it into a 2-D top-view Cartesian coordinate system. The evaluations onCityscapes show that the end-to-end learning of semantic-metric occupancy gridsoutperforms the deterministic mapping approach with flat-plane assumption bymore than 12% mean IoU. Furthermore, we show that the variational sampling witha relatively small embedding vector brings robustness against vehicle dynamicperturbations, and generalizability for unseen KITTI data. Our network achievesreal-time inference rates of approx. 35 Hz for an input image with a resolutionof 256x512 pixels and an output map with 64x64 occupancy grid cells using aTitan V GPU.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| bird-s-eye-view-semantic-segmentation-on | VED | IoU veh - 224x480 - No vis filter - 100x50 at 0.25: 8.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.