HyperAIHyperAI

Command Palette

Search for a command to run...

LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

Florent Bartoccioni Éloi Zablocki Andrei Bursuc Patrick Pérez Matthieu Cord Karteek Alahari

Abstract

Recent works in autonomous driving have widely adopted the bird's-eye-view(BEV) semantic map as an intermediate representation of the world. Onlineprediction of these BEV maps involves non-trivial operations such asmulti-camera data extraction as well as fusion and projection into a commontopview grid. This is usually done with error-prone geometric operations (e.g.,homography or back-projection from monocular depth estimation) or expensivedirect dense mapping between image pixels and pixels in BEV (e.g., with MLP orattention). In this work, we present 'LaRa', an efficient encoder-decoder,transformer-based model for vehicle semantic segmentation from multiplecameras. Our approach uses a system of cross-attention to aggregate informationover multiple sensors into a compact, yet rich, collection of latentrepresentations. These latent representations, after being processed by aseries of self-attention blocks, are then reprojected with a secondcross-attention in the BEV space. We demonstrate that our model outperforms thebest previous works using transformers on nuScenes. The code and trained modelsare available at https://github.com/valeoai/LaRa


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation | Papers | HyperAI