Command Palette
Search for a command to run...
Wang Yizhou ; Meinhardt Tim ; Cetintas Orcun ; Yang Cheng-Yen ; Pusegaonkar Sameer Satish ; Missaoui Benjamin ; Biswas Sujit ; Tang Zheng ; Leal-Taixé Laura

Abstract
Object perception from multi-view cameras is crucial for intelligent systems,particularly in indoor environments, e.g., warehouses, retail stores, andhospitals. Most traditional multi-target multi-camera (MTMC) detection andtracking methods rely on 2D object detection, single-view multi-object tracking(MOT), and cross-view re-identification (ReID) techniques, without properlyhandling important 3D information by multi-view image aggregation. In thispaper, we propose a 3D object detection and tracking framework, named MCBLT,which first aggregates multi-view images with necessary camera calibrationparameters to obtain 3D object detections in bird's-eye view (BEV). Then, weintroduce hierarchical graph neural networks (GNNs) to track these 3Ddetections in BEV for MTMC tracking results. Unlike existing methods, MCBLT hasimpressive generalizability across different scenes and diverse camerasettings, with exceptional capability for long-term association handling. As aresult, our proposed MCBLT establishes a new state-of-the-art on the AICity'24dataset with $81.22$ HOTA, and on the WildTrack dataset with $95.6$ IDF1.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multi-object-tracking-on-2024-ai-city | BEV-SUSHI | AssA: 76.19 DetA: 86.94 HOTA: 81.22 LocA: 95.67 |
| multi-object-tracking-on-wildtrack | BEV-SUSHI | IDF1: 95.6 MOTA: 92.6 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.