8 months ago

Abstract

3D visual perception tasks, including 3D detection and map segmentation basedon multi-camera images, are essential for autonomous driving systems. In thiswork, we present a new framework termed BEVFormer, which learns unified BEVrepresentations with spatiotemporal transformers to support multiple autonomousdriving perception tasks. In a nutshell, BEVFormer exploits both spatial andtemporal information by interacting with spatial and temporal space throughpredefined grid-shaped BEV queries. To aggregate spatial information, we designspatial cross-attention that each BEV query extracts the spatial features fromthe regions of interest across camera views. For temporal information, wepropose temporal self-attention to recurrently fuse the history BEVinformation. Our approach achieves the new state-of-the-art 56.9% in terms ofNDS metric on the nuScenes \texttt{test} set, which is 9.0 points higher thanprevious best arts and on par with the performance of LiDAR-based baselines. Wefurther show that BEVFormer remarkably improves the accuracy of velocityestimation and recall of objects under low visibility conditions. The code isavailable at \url{https://github.com/zhiqi-li/BEVFormer}.

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Li Zhiqi ; Wang Wenhai ; Li Hongyang ; Xie Enze ; Sima Chonghao ; Lu Tong ; Yu Qiao ; Dai Jifeng

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

8 months ago

Li Zhiqi ; Wang Wenhai ; Li Hongyang ; Xie Enze ; Sima Chonghao ; Lu Tong ; Yu Qiao ; Dai Jifeng

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers | Papers | HyperAI

Command Palette

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Li Zhiqi ; Wang Wenhai ; Li Hongyang ; Xie Enze ; Sima Chonghao ; Lu Tong ; Yu Qiao ; Dai Jifeng

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Li Zhiqi ; Wang Wenhai ; Li Hongyang ; Xie Enze ; Sima Chonghao ; Lu Tong ; Yu Qiao ; Dai Jifeng

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

Li Zhiqi ; Wang Wenhai ; Li Hongyang ; Xie Enze ; Sima Chonghao ; Lu Tong ; Yu Qiao ; Dai Jifeng

Abstract

Build AI with AI

HyperAI Newsletters