HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

Kaixin Xiong Shi Gong Xiaoqing Ye Xiao Tan Ji Wan Errui Ding Jingdong Wang Xiang Bai

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

Abstract

In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego-motion for boosting 3D object detection. CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset. Codes and models are available on \href{https://github.com/PaddlePaddle/Paddle3D}{Paddle3D} and \href{https://github.com/kaixinbear/CAPE}{PyTorch Implementation}.

Code Repositories

kaixinbear/CAPE
Official
pytorch
Mentioned in GitHub
PaddlePaddle/Paddle3D
Official
paddle

Benchmarks

BenchmarkMethodologyMetrics
3d-object-detection-on-nuscenes-camera-onlyCAPE
Future Frame: false
NDS: 62.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection | Papers | HyperAI