| MOTIP (Deformable DETR, with DanceTrack val and CrowdHuman) | 65.9 | 73.7 | 78.4 | 92.7 | Multiple Object Tracking as ID Prediction | |
| MOTIP (Deformable DETR, with CrowdHuman) | 62.8 | 71.4 | 76.3 | 91.6 | Multiple Object Tracking as ID Prediction | |
| MOTIP (DAB-Deformable DETR) | 60.8 | 70.0 | 75.1 | 91.0 | Multiple Object Tracking as ID Prediction | |
| MOTIP (Deformable DETR) | 57.6 | 67.5 | 72.2 | 90.3 | Multiple Object Tracking as ID Prediction | |