5 months ago

LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

Bartoccioni Florent ; Zablocki Éloi ; Bursuc Andrei ; Pérez Patrick ; Cord Matthieu ; Alahari Karteek

Abstract

Recent works in autonomous driving have widely adopted the bird's-eye-view(BEV) semantic map as an intermediate representation of the world. Onlineprediction of these BEV maps involves non-trivial operations such asmulti-camera data extraction as well as fusion and projection into a commontopview grid. This is usually done with error-prone geometric operations (e.g.,homography or back-projection from monocular depth estimation) or expensivedirect dense mapping between image pixels and pixels in BEV (e.g., with MLP orattention). In this work, we present 'LaRa', an efficient encoder-decoder,transformer-based model for vehicle semantic segmentation from multiplecameras. Our approach uses a system of cross-attention to aggregate informationover multiple sensors into a compact, yet rich, collection of latentrepresentations. These latent representations, after being processed by aseries of self-attention blocks, are then reprojected with a secondcross-attention in the BEV space. We demonstrate that our model outperforms thebest previous works using transformers on nuScenes. The code and trained modelsare available at https://github.com/valeoai/LaRa

Code Repositories

valeoai/LaRa

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
bird-s-eye-view-semantic-segmentation-on	LaRa	IoU veh - 224x480 - No vis filter - 100x100 at 0.5: 35.4 IoU veh - 224x480 - Vis filter. - 100x100 at 0.5: 38.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette