Command Palette
Search for a command to run...
3D Dual-Fusion: Dual-Domain Dual-Query Camera-LiDAR Fusion for 3D Object Detection
Yecheol Kim; Konyul Park; Minwook Kim; Dongsuk Kum; Jun Won Choi

Abstract
Fusing data from cameras and LiDAR sensors is an essential technique to achieve robust 3D object detection. One key challenge in camera-LiDAR fusion involves mitigating the large domain gap between the two sensors in terms of coordinates and data distribution when fusing their features. In this paper, we propose a novel camera-LiDAR fusion architecture called, 3D Dual-Fusion, which is designed to mitigate the gap between the feature representations of camera and LiDAR data. The proposed method fuses the features of the camera-view and 3D voxel-view domain and models their interactions through deformable attention. We redesign the transformer fusion encoder to aggregate the information from the two domains. Two major changes include 1) dual query-based deformable attention to fuse the dual-domain features interactively and 2) 3D local self-attention to encode the voxel-domain queries prior to dual-query decoding. The results of an experimental evaluation show that the proposed camera-LiDAR fusion architecture achieved competitive performance on the KITTI and nuScenes datasets, with state-of-the-art performances in some 3D object detection benchmarks categories.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-object-detection-on-kitti-cars-easy | 3D Dual-Fusion | AP: 91.01% |
| 3d-object-detection-on-kitti-cars-hard | 3D Dual-Fusion | AP: 79.39% |
| 3d-object-detection-on-nuscenes | 3D Dual-Fusion_T | NDS: 0.73 mAAE: 0.13 mAOE: 0.33 mAP: 0.71 mASE: 0.24 mATE: 0.26 mAVE: 0.27 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.