HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Multiple Object Tracking as ID Prediction

Ruopeng Gao Ji Qi Limin Wang

Multiple Object Tracking as ID Prediction

Abstract

Multi-Object Tracking (MOT) has been a long-standing challenge in video understanding. A natural and intuitive approach is to split this task into two parts: object detection and association. Most mainstream methods employ meticulously crafted heuristic techniques to maintain trajectory information and compute cost matrices for object matching. Although these methods can achieve notable tracking performance, they often require a series of elaborate handcrafted modifications while facing complicated scenarios. We believe that manually assumed priors limit the method's adaptability and flexibility in learning optimal tracking capabilities from domain-specific data. Therefore, we introduce a new perspective that treats Multiple Object Tracking as an in-context ID Prediction task, transforming the aforementioned object association into an end-to-end trainable task. Based on this, we propose a simple yet effective method termed MOTIP. Given a set of trajectories carried with ID information, MOTIP directly decodes the ID labels for current detections to accomplish the association process. Without using tailored or sophisticated architectures, our method achieves state-of-the-art results across multiple benchmarks by solely leveraging object-level features as tracking cues. The simplicity and impressive results of MOTIP leave substantial room for future advancements, thereby making it a promising baseline for subsequent research. Our code and checkpoints are released at https://github.com/MCG-NJU/MOTIP.

Code Repositories

MCG-NJU/MOTIP
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
multi-object-tracking-on-dancetrackMOTIP (DAB-Deformable DETR)
AssA: 60.8
DetA: 80.8
HOTA: 70.0
IDF1: 75.1
MOTA: 91.0
multi-object-tracking-on-dancetrackMOTIP (Deformable DETR, with CrowdHuman)
AssA: 62.8
DetA: 81.3
HOTA: 71.4
IDF1: 76.3
MOTA: 91.6
multi-object-tracking-on-dancetrackMOTIP (Deformable DETR, with DanceTrack val and CrowdHuman)
AssA: 65.9
DetA: 82.6
HOTA: 73.7
IDF1: 78.4
MOTA: 92.7
multi-object-tracking-on-dancetrackMOTIP (Deformable DETR)
AssA: 57.6
DetA: 79.4
HOTA: 67.5
IDF1: 72.2
MOTA: 90.3
multi-object-tracking-on-mot17MOTIP (Deformable-DETR)
HOTA: 59.2
e2e-MOT: Yes
multiple-object-tracking-on-sportsmotMOTIP (Deformable DETR, with SportsMOT val)
AssA: 65.4
DetA: 86.5
HOTA: 75.2
IDF1: 78.2
MOTA: 96.1
multiple-object-tracking-on-sportsmotMOTIP (Deformable DETR)
AssA: 62.0
DetA: 83.4
HOTA: 71.9
IDF1: 75.0
MOTA: 92.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multiple Object Tracking as ID Prediction | Papers | HyperAI