Command Palette
Search for a command to run...
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation
Ke Lei ; Li Xia ; Danelljan Martin ; Tai Yu-Wing ; Tang Chi-Keung ; Yu Fisher

Abstract
Multiple object tracking and segmentation requires detecting, tracking, andsegmenting objects belonging to a set of given classes. Most approaches onlyexploit the temporal dimension to address the association problem, whilerelying on single frame predictions for the segmentation mask itself. Wepropose Prototypical Cross-Attention Network (PCAN), capable of leveraging richspatio-temporal information for online multiple object tracking andsegmentation. PCAN first distills a space-time memory into a set of prototypesand then employs cross-attention to retrieve rich information from the pastframes. To segment each object, PCAN adopts a prototypical appearance module tolearn a set of contrastive foreground and background prototypes, which are thenpropagated over time. Extensive experiments demonstrate that PCAN outperformscurrent video instance tracking and segmentation competition winners on bothYoutube-VIS and BDD100K datasets, and shows efficacy to both one-stage andtwo-stage segmentation frameworks. Code and video resources are available athttp://vis.xyz/pub/pcan.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multi-object-tracking-and-segmentation-on-3 | QDTrack-mots-fix | mMOTSA: 23.5 |
| multi-object-tracking-and-segmentation-on-3 | SortIoU | mMOTSA: 10.3 |
| multi-object-tracking-and-segmentation-on-3 | MaskTrackRCNN | mMOTSA: 12.3 |
| multi-object-tracking-and-segmentation-on-3 | PCAN | mMOTSA: 27.4 |
| multi-object-tracking-and-segmentation-on-3 | QDTrack-mots | mMOTSA: 22.5 |
| multi-object-tracking-and-segmentation-on-3 | STEm-Seg | mMOTSA: 12.2 |
| multiple-object-track-and-segmentation-on-2 | PCAN | mMOTSA: 27.4 |
| video-instance-segmentation-on-bdd100k-val | QDTrack-mots-fix | mMOTSA: 23.5 |
| video-instance-segmentation-on-bdd100k-val | QDTrack-mots | mMOTSA: 22.5 |
| video-instance-segmentation-on-bdd100k-val | STEm-Seg | mMOTSA: 12.2 |
| video-instance-segmentation-on-bdd100k-val | MaskTrackRCNN | mMOTSA: 12.3 |
| video-instance-segmentation-on-bdd100k-val | SortIoU | mMOTSA: 10.3 |
| video-instance-segmentation-on-bdd100k-val | PCAN | mMOTSA: 27.4 |
| video-instance-segmentation-on-youtube-vis-1 | PCAN(ResNet-50) | AP50: 54.9 AP75: 39.4 AR1: 36.3 AR10: 41.6 mask AP: 36.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.