5 months ago

MCBLT: Multi-Camera Multi-Object 3D Tracking in Long Videos

Wang Yizhou ; Meinhardt Tim ; Cetintas Orcun ; Yang Cheng-Yen ; Pusegaonkar Sameer Satish ; Missaoui Benjamin ; Biswas Sujit ; Tang Zheng ; Leal-Taixé Laura

Abstract

Object perception from multi-view cameras is crucial for intelligent systems,particularly in indoor environments, e.g., warehouses, retail stores, andhospitals. Most traditional multi-target multi-camera (MTMC) detection andtracking methods rely on 2D object detection, single-view multi-object tracking(MOT), and cross-view re-identification (ReID) techniques, without properlyhandling important 3D information by multi-view image aggregation. In thispaper, we propose a 3D object detection and tracking framework, named MCBLT,which first aggregates multi-view images with necessary camera calibrationparameters to obtain 3D object detections in bird's-eye view (BEV). Then, weintroduce hierarchical graph neural networks (GNNs) to track these 3Ddetections in BEV for MTMC tracking results. Unlike existing methods, MCBLT hasimpressive generalizability across different scenes and diverse camerasettings, with exceptional capability for long-term association handling. As aresult, our proposed MCBLT establishes a new state-of-the-art on the AICity'24dataset with $81.22$ HOTA, and on the WildTrack dataset with $95.6$ IDF1.

Benchmarks

Benchmark	Methodology	Metrics
multi-object-tracking-on-2024-ai-city	BEV-SUSHI	AssA: 76.19 DetA: 86.94 HOTA: 81.22 LocA: 95.67
multi-object-tracking-on-wildtrack	BEV-SUSHI	IDF1: 95.6 MOTA: 92.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette