5 months ago

Cross-view Transformers for real-time Map-view Semantic Segmentation

Zhou Brady ; Krähenbühl Philipp

Abstract

We present cross-view transformers, an efficient attention-based model formap-view semantic segmentation from multiple cameras. Our architectureimplicitly learns a mapping from individual camera views into a canonicalmap-view representation using a camera-aware cross-view attention mechanism.Each camera uses positional embeddings that depend on its intrinsic andextrinsic calibration. These embeddings allow a transformer to learn themapping across different views without ever explicitly modeling itgeometrically. The architecture consists of a convolutional image encoder foreach view and cross-view transformer layers to infer a map-view semanticsegmentation. Our model is simple, easily parallelizable, and runs inreal-time. The presented architecture performs at state-of-the-art on thenuScenes dataset, with 4x faster inference speeds. Code is available athttps://github.com/bradyz/cross_view_transformers.

Code Repositories

valeoai/pointbev

pytorch

Mentioned in GitHub

bradyz/cross_view_transformers

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
bird-s-eye-view-semantic-segmentation-on	CVT	IoU veh - 224x480 - No vis filter - 100x100 at 0.5: 31.4 IoU veh - 224x480 - Vis filter. - 100x100 at 0.5: 36.0 IoU veh - 448x800 - No vis filter - 100x100 at 0.5: 32.5 IoU veh - 448x800 - Vis filter. - 100x100 at 0.5: 37.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette