
摘要
本文提出了一种简单而高效的方法,用于建模视频对象分割中的时空对应关系。与现有大多数方法不同,我们直接在帧之间建立对应关系,而无需对每个对象重新编码掩码特征,从而构建了一个高效且鲁棒的框架。通过这些对应关系,当前查询帧中的每个节点均可通过关联式地聚合历史帧中的特征进行推断。我们将这一聚合过程建模为一个投票问题,发现现有的内积相似度机制会导致记忆节点的使用效率低下:仅一小部分(固定)的记忆节点始终占据主导投票权,且这一现象与查询内容无关。针对这一问题,我们提出采用负的平方欧氏距离来计算相似度。实验验证表明,该方法使每个记忆节点都有机会参与投票,且实证结果表明,这种多样化的投票机制显著提升了记忆利用效率与推理准确性。对应关系网络与多样化投票机制的协同作用表现卓越,在DAVIS和YouTubeVOS两个数据集上均取得了新的最先进性能,同时在处理多对象场景时,无需额外复杂技巧,即可实现20帧/秒以上的运行速度。
代码仓库
alipga/AMM_VOS
pytorch
GitHub 中提及
limingxing00/rde-vos-cvpr2022
pytorch
GitHub 中提及
hkchengrex/STCN
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| semi-supervised-video-object-segmentation-on-1 | STCN | F-measure (Decay): 10.3 F-measure (Mean): 83.5 F-measure (Recall): 89.7 Ju0026F: 79.9 Jaccard (Decay): 10.5 Jaccard (Mean): 76.3 Jaccard (Recall): 85.5 |
| semi-supervised-video-object-segmentation-on-18 | STCN | F-Measure (Seen): 87.0 F-Measure (Unseen): 87.7 Jaccard (Seen): 82.6 Jaccard (Unseen): 79.4 Overall: 84.2 |
| semi-supervised-video-object-segmentation-on-18 | STCN (MS) | F-Measure (Seen): 87.8 F-Measure (Unseen): 88.8 Jaccard (Seen): 83.5 Jaccard (Unseen): 80.8 Overall: 85.2 |
| semi-supervised-video-object-segmentation-on-21 | STCN | F: 55.0 J: 46.6 Ju0026F: 50.8 |
| video-object-segmentation-on-youtube-vos | STCN | F-Measure (Seen): 87.9 F-Measure (Unseen): 87.3 Jaccard (Seen): 83.2 Jaccard (Unseen): 79.0 |
| video-object-segmentation-on-youtube-vos-2019-2 | STCN | F-Measure (Seen): 85.4 F-Measure (Unseen): 85.9 Jaccard (Seen): 81.1 Jaccard (Unseen): 78.2 Mean Jaccard u0026 F-Measure: 82.7 |
| visual-object-tracking-on-davis-2016 | STCN | F-measure (Decay): 4.3 F-measure (Mean): 93.0 F-measure (Recall): 97.1 Ju0026F: 91.7 Jaccard (Decay): 4.1 Jaccard (Mean): 90.4 Jaccard (Recall): 98.1 Speed (FPS): 26.9 |
| visual-object-tracking-on-davis-2017 | STCN | F-measure (Decay): 85.3 F-measure (Mean): 88.6 F-measure (Recall): 94.6 Ju0026F: 85.3 Jaccard (Decay): 6.2 Jaccard (Mean): 82.0 Jaccard (Recall): 91.3 Speed (FPS): 20.2 |