
摘要
多目标跟踪与分割需要检测、跟踪并分割属于给定类别的多个对象。大多数方法仅利用时间维度来解决关联问题,而依赖单帧预测来进行分割掩码的生成。我们提出了一种原型交叉注意力网络(Prototypical Cross-Attention Network, PCAN),该网络能够在线利用丰富的时空信息进行多目标跟踪与分割。PCAN首先将时空记忆提炼为一组原型,然后通过交叉注意力从过去的帧中检索丰富信息。为了分割每个对象,PCAN采用了一个原型外观模块来学习一组对比性的前景和背景原型,并将其在时间上进行传播。广泛的实验表明,PCAN在YouTube-VIS和BDD100K数据集上均优于当前视频实例跟踪和分割竞赛的获胜者,并且对单阶段和两阶段分割框架都显示出有效性。代码和视频资源可访问 http://vis.xyz/pub/pcan 获取。
代码仓库
SysCV/pcan
官方
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| multi-object-tracking-and-segmentation-on-3 | QDTrack-mots-fix | mMOTSA: 23.5 |
| multi-object-tracking-and-segmentation-on-3 | SortIoU | mMOTSA: 10.3 |
| multi-object-tracking-and-segmentation-on-3 | MaskTrackRCNN | mMOTSA: 12.3 |
| multi-object-tracking-and-segmentation-on-3 | PCAN | mMOTSA: 27.4 |
| multi-object-tracking-and-segmentation-on-3 | QDTrack-mots | mMOTSA: 22.5 |
| multi-object-tracking-and-segmentation-on-3 | STEm-Seg | mMOTSA: 12.2 |
| multiple-object-track-and-segmentation-on-2 | PCAN | mMOTSA: 27.4 |
| video-instance-segmentation-on-bdd100k-val | QDTrack-mots-fix | mMOTSA: 23.5 |
| video-instance-segmentation-on-bdd100k-val | QDTrack-mots | mMOTSA: 22.5 |
| video-instance-segmentation-on-bdd100k-val | STEm-Seg | mMOTSA: 12.2 |
| video-instance-segmentation-on-bdd100k-val | MaskTrackRCNN | mMOTSA: 12.3 |
| video-instance-segmentation-on-bdd100k-val | SortIoU | mMOTSA: 10.3 |
| video-instance-segmentation-on-bdd100k-val | PCAN | mMOTSA: 27.4 |
| video-instance-segmentation-on-youtube-vis-1 | PCAN(ResNet-50) | AP50: 54.9 AP75: 39.4 AR1: 36.3 AR10: 41.6 mask AP: 36.1 |