5 months ago

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Huang Kuan-Chih ; Wu Tsung-Han ; Su Hung-Ting ; Hsu Winston H.

Abstract

Monocular 3D object detection is an important yet challenging task inautonomous driving. Some existing methods leverage depth information from anoff-the-shelf depth estimator to assist 3D detection, but suffer from theadditional computational burden and achieve limited performance caused byinaccurate depth priors. To alleviate this, we propose MonoDTR, a novelend-to-end depth-aware transformer network for monocular 3D object detection.It mainly consists of two components: (1) the Depth-Aware Feature Enhancement(DFE) module that implicitly learns depth-aware features with auxiliarysupervision without requiring extra computation, and (2) the Depth-AwareTransformer (DTR) module that globally integrates context- and depth-awarefeatures. Moreover, different from conventional pixel-wise positionalencodings, we introduce a novel depth positional encoding (DPE) to inject depthpositional hints into transformers. Our proposed depth-aware modules can beeasily plugged into existing image-only monocular 3D object detectors toimprove the performance. Extensive experiments on the KITTI dataset demonstratethat our approach outperforms previous state-of-the-art monocular-based methodsand achieves real-time detection. Code is available athttps://github.com/kuanchihhuang/MonoDTR

Code Repositories

kuanchihhuang/monodtr

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
3d-object-detection-from-monocular-images-on-7	MonoDTR	AP25: 39.76 AP50: 3.02

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette