Command Palette
Search for a command to run...
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Huang Kuan-Chih ; Wu Tsung-Han ; Su Hung-Ting ; Hsu Winston H.

Abstract
Monocular 3D object detection is an important yet challenging task inautonomous driving. Some existing methods leverage depth information from anoff-the-shelf depth estimator to assist 3D detection, but suffer from theadditional computational burden and achieve limited performance caused byinaccurate depth priors. To alleviate this, we propose MonoDTR, a novelend-to-end depth-aware transformer network for monocular 3D object detection.It mainly consists of two components: (1) the Depth-Aware Feature Enhancement(DFE) module that implicitly learns depth-aware features with auxiliarysupervision without requiring extra computation, and (2) the Depth-AwareTransformer (DTR) module that globally integrates context- and depth-awarefeatures. Moreover, different from conventional pixel-wise positionalencodings, we introduce a novel depth positional encoding (DPE) to inject depthpositional hints into transformers. Our proposed depth-aware modules can beeasily plugged into existing image-only monocular 3D object detectors toimprove the performance. Extensive experiments on the KITTI dataset demonstratethat our approach outperforms previous state-of-the-art monocular-based methodsand achieves real-time detection. Code is available athttps://github.com/kuanchihhuang/MonoDTR
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-object-detection-from-monocular-images-on-7 | MonoDTR | AP25: 39.76 AP50: 3.02 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.