Command Palette
Search for a command to run...
LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation
Bartoccioni Florent ; Zablocki Éloi ; Bursuc Andrei ; Pérez Patrick ; Cord Matthieu ; Alahari Karteek

Abstract
Recent works in autonomous driving have widely adopted the bird's-eye-view(BEV) semantic map as an intermediate representation of the world. Onlineprediction of these BEV maps involves non-trivial operations such asmulti-camera data extraction as well as fusion and projection into a commontopview grid. This is usually done with error-prone geometric operations (e.g.,homography or back-projection from monocular depth estimation) or expensivedirect dense mapping between image pixels and pixels in BEV (e.g., with MLP orattention). In this work, we present 'LaRa', an efficient encoder-decoder,transformer-based model for vehicle semantic segmentation from multiplecameras. Our approach uses a system of cross-attention to aggregate informationover multiple sensors into a compact, yet rich, collection of latentrepresentations. These latent representations, after being processed by aseries of self-attention blocks, are then reprojected with a secondcross-attention in the BEV space. We demonstrate that our model outperforms thebest previous works using transformers on nuScenes. The code and trained modelsare available at https://github.com/valeoai/LaRa
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| bird-s-eye-view-semantic-segmentation-on | LaRa | IoU veh - 224x480 - No vis filter - 100x100 at 0.5: 35.4 IoU veh - 224x480 - Vis filter. - 100x100 at 0.5: 38.9 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.