
摘要
我们介绍了XMem,一种受阿特金森-希夫林记忆模型启发的长视频对象分割架构,该架构采用了统一的特征记忆存储。以往的视频对象分割研究通常仅使用一种类型的特征记忆。对于超过一分钟的长视频,单一特征记忆模型在内存消耗和准确性之间存在紧密联系。相比之下,我们根据阿特金森-希夫林模型开发了一种架构,该架构集成了多个独立但深度连接的特征记忆存储:快速更新的感觉记忆、高分辨率的工作记忆以及紧凑且持久的长期记忆。关键在于,我们开发了一种记忆增强算法,该算法定期将活跃使用的工作记忆元素整合到长期记忆中,从而避免了内存爆炸并最小化了长期预测中的性能衰减。结合新的内存读取机制,XMem在长视频数据集上的表现远超现有最先进方法,在短视频数据集上也达到了与现有最先进方法(这些方法不适用于长视频)相当的水平。代码可在https://hkchengrex.github.io/XMem 获取。
代码仓库
tianyuan168326/videosemanticcompression-pytorch
pytorch
GitHub 中提及
hkchengrex/XMem
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| semi-supervised-video-object-segmentation-on-1 | XMem (BL30K) | F-measure (Mean): 84.7 Ju0026F: 81.2 Jaccard (Mean): 77.6 |
| semi-supervised-video-object-segmentation-on-1 | XMem (BL30K, MS) | F-measure (Mean): 87.0 Ju0026F: 83.7 Jaccard (Mean): 80.5 |
| semi-supervised-video-object-segmentation-on-1 | XMem (BL30K, 600p) | F-measure (Mean): 85.8 Ju0026F: 82.5 Jaccard (Mean): 79.1 |
| semi-supervised-video-object-segmentation-on-1 | XMem (MS) | F-measure (Mean): 86.4 Ju0026F: 83.1 Jaccard (Mean): 79.7 |
| semi-supervised-video-object-segmentation-on-1 | XMem | F-measure (Mean): 84.5 Ju0026F: 81.0 Jaccard (Mean): 77.4 |
| semi-supervised-video-object-segmentation-on-1 | XMem (DAVIS and YouTubeVOS only) | F-measure (Mean): 83.4 Ju0026F: 79.8 Jaccard (Mean): 76.3 |
| semi-supervised-video-object-segmentation-on-13 | XMem | F: 91.6±0.2 J: 88.0±0.2 Ju0026F: 89.8±0.2 |
| semi-supervised-video-object-segmentation-on-14 | XMem | F: 91.8±0.4 J: 88.2±0.3 Ju0026F: 90.0±0.4 |
| semi-supervised-video-object-segmentation-on-18 | XMem (BL30K, MS) | F-Measure (Seen): 89.8 F-Measure (Unseen): 89.9 Jaccard (Seen): 85.5 Jaccard (Unseen): 81.8 Overall: 86.8 |
| semi-supervised-video-object-segmentation-on-18 | XMem (MS) | F-Measure (Seen): 89.2 F-Measure (Unseen): 89.8 Jaccard (Seen): 84.9 Jaccard (Unseen): 81.8 Overall: 86.4 |
| semi-supervised-video-object-segmentation-on-18 | XMem | F-Measure (Seen): 88.0 F-Measure (Unseen): 87.1 Jaccard (Seen): 83.6 Jaccard (Unseen): 78.5 Overall: 84.3 |
| semi-supervised-video-object-segmentation-on-18 | XMem (BL30K) | F-Measure (Seen): 89.2 F-Measure (Unseen): 88.8 Jaccard (Seen): 84.8 Jaccard (Unseen): 80.3 Overall: 85.8 |
| semi-supervised-video-object-segmentation-on-20 | XMem | FPS: 29.6 |
| semi-supervised-video-object-segmentation-on-21 | XMem | F: 62.0 J: 53.3 Ju0026F: 57.6 |
| video-object-segmentation-on-davis-2016 | XMem (BL30K, MS) | F-Score: 94.4 Ju0026F: 93.3 Jaccard (Mean): 92.2 |
| video-object-segmentation-on-davis-2016 | XMem | F-Score: 92.7 Ju0026F: 91.5 Jaccard (Mean): 90.4 |
| video-object-segmentation-on-davis-2017-test | XMem | F-measure: 84.5 Jaccard: 77.4 Mean Jaccard u0026 F-Measure: 81.0 |
| video-object-segmentation-on-davis-2017-test | XMem (BL30K, MS) | F-measure: 87.0 Jaccard: 80.5 Mean Jaccard u0026 F-Measure: 83.7 |
| video-object-segmentation-on-davis-2017-val | XMem | F-measure: 89.5 Jaccard: 82.9 Mean Jaccard u0026 F-Measure: 86.2 |
| video-object-segmentation-on-davis-2017-val | XMem (BLK30K, MS) | F-measure: 92.6 Jaccard: 86.3 Mean Jaccard u0026 F-Measure: 89.5 |
| video-object-segmentation-on-youtube-vos | XMem (MS) | F-Measure (Seen): 89.9 F-Measure (Unseen): 89.9 Jaccard (Seen): 85.3 Jaccard (Unseen): 81.7 Overall: 86.7 |
| video-object-segmentation-on-youtube-vos | XMem | F-Measure (Seen): 89.3 F-Measure (Unseen): 88.7 Jaccard (Seen): 84.6 Jaccard (Unseen): 80.2 Overall: 85.7 Speed (FPS): 22.6 |
| video-object-segmentation-on-youtube-vos | XMem (YouTubeVOS only) | F-Measure (Seen): 88.5 F-Measure (Unseen): 87.2 Jaccard (Seen): 83.7 Jaccard (Unseen): 78.2 Overall: 84.4 Speed (FPS): 22.6 |
| video-object-segmentation-on-youtube-vos | XMem (BL30K, MS) | F-Measure (Seen): 90.3 F-Measure (Unseen): 90.2 Jaccard (Seen): 85.6 Jaccard (Unseen): 81.7 Overall: 86.9 |
| video-object-segmentation-on-youtube-vos | XMem (BL30K) | F-Measure (Seen): 89.8 F-Measure (Unseen): 89.2 Jaccard (Seen): 85.1 Jaccard (Unseen): 80.3 Overall: 86.1 Speed (FPS): 22.6 |
| video-object-segmentation-on-youtube-vos-1 | XMem (BL30K, MS) | F-Measure (Seen): 90.3 F-Measure (Unseen): 90.2 Jaccard (Seen): 85.6 Jaccard (Unseen): 81.7 Mean Jaccard u0026 F-Measure: 86.9 |
| video-object-segmentation-on-youtube-vos-2019-2 | XMem | F-Measure (Seen): 88.6 F-Measure (Unseen): 88.6 Jaccard (Seen): 84.3 Jaccard (Unseen): 80.3 Mean Jaccard u0026 F-Measure: 85.5 |
| video-object-segmentation-on-youtube-vos-2019-2 | XMem (BL30K,MS) | F-Measure (Seen): 89.8 F-Measure (Unseen): 89.9 Jaccard (Seen): 85.5 Jaccard (Unseen): 81.8 Mean Jaccard u0026 F-Measure: 86.8 |
| visual-object-tracking-on-davis-2016 | XMem (DAVIS+YouTubeVOS only) | F-measure (Mean): 91.9 Ju0026F: 90.8 Jaccard (Mean): 89.6 Speed (FPS): 29.6 |
| visual-object-tracking-on-davis-2016 | XMem (MS) | F-measure (Mean): 93.5 Ju0026F: 92.7 Jaccard (Mean): 92.0 |
| visual-object-tracking-on-davis-2016 | XMem (DAVIS only) | F-measure (Mean): 88.9 Ju0026F: 87.8 Jaccard (Mean): 86.7 Speed (FPS): 29.6 |
| visual-object-tracking-on-davis-2016 | XMem (BL30K, MS) | F-measure (Mean): 94.4 Ju0026F: 93.3 Jaccard (Mean): 92.2 |
| visual-object-tracking-on-davis-2016 | XMem (BL30K) | F-measure (Mean): 93.2 Ju0026F: 92.0 Jaccard (Mean): 90.7 Speed (FPS): 29.6 |
| visual-object-tracking-on-davis-2016 | XMem | F-measure (Mean): 92.7 Ju0026F: 91.5 Jaccard (Mean): 90.4 Speed (FPS): 29.6 |
| visual-object-tracking-on-davis-2017 | XMem (MS) | F-measure (Mean): 91.0 Ju0026F: 88.2 Jaccard (Mean): 85.4 |
| visual-object-tracking-on-davis-2017 | XMem (BL30K, MS) | F-measure (Mean): 92.6 Ju0026F: 89.5 Jaccard (Mean): 86.3 |
| visual-object-tracking-on-davis-2017 | XMem (DAVIS only) | F-measure (Mean): 79.3 Ju0026F: 76.7 Jaccard (Mean): 74.1 Speed (FPS): 22.6 |
| visual-object-tracking-on-davis-2017 | XMem | F-measure (Mean): 89.5 Ju0026F: 86.2 Jaccard (Mean): 82.9 Speed (FPS): 22.6 |
| visual-object-tracking-on-davis-2017 | XMem (BL30K) | F-measure (Mean): 91.4 Ju0026F: 87.7 Jaccard (Mean): 84.0 Speed (FPS): 22.6 |
| visual-object-tracking-on-davis-2017 | XMem (DAVIS and YouTubeVOS only) | F-measure (Mean): 87.6 Ju0026F: 84.5 Jaccard (Mean): 81.4 Speed (FPS): 22.6 |