4 个月前

XMem:基于阿特金森-希夫林记忆模型的长期视频对象分割

XMem:基于阿特金森-希夫林记忆模型的长期视频对象分割

摘要

我们介绍了XMem,一种受阿特金森-希夫林记忆模型启发的长视频对象分割架构,该架构采用了统一的特征记忆存储。以往的视频对象分割研究通常仅使用一种类型的特征记忆。对于超过一分钟的长视频,单一特征记忆模型在内存消耗和准确性之间存在紧密联系。相比之下,我们根据阿特金森-希夫林模型开发了一种架构,该架构集成了多个独立但深度连接的特征记忆存储:快速更新的感觉记忆、高分辨率的工作记忆以及紧凑且持久的长期记忆。关键在于,我们开发了一种记忆增强算法,该算法定期将活跃使用的工作记忆元素整合到长期记忆中,从而避免了内存爆炸并最小化了长期预测中的性能衰减。结合新的内存读取机制,XMem在长视频数据集上的表现远超现有最先进方法,在短视频数据集上也达到了与现有最先进方法(这些方法不适用于长视频)相当的水平。代码可在https://hkchengrex.github.io/XMem 获取。

代码仓库

hkchengrex/XMem
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
semi-supervised-video-object-segmentation-on-1XMem (BL30K)
F-measure (Mean): 84.7
Ju0026F: 81.2
Jaccard (Mean): 77.6
semi-supervised-video-object-segmentation-on-1XMem (BL30K, MS)
F-measure (Mean): 87.0
Ju0026F: 83.7
Jaccard (Mean): 80.5
semi-supervised-video-object-segmentation-on-1XMem (BL30K, 600p)
F-measure (Mean): 85.8
Ju0026F: 82.5
Jaccard (Mean): 79.1
semi-supervised-video-object-segmentation-on-1XMem (MS)
F-measure (Mean): 86.4
Ju0026F: 83.1
Jaccard (Mean): 79.7
semi-supervised-video-object-segmentation-on-1XMem
F-measure (Mean): 84.5
Ju0026F: 81.0
Jaccard (Mean): 77.4
semi-supervised-video-object-segmentation-on-1XMem (DAVIS and YouTubeVOS only)
F-measure (Mean): 83.4
Ju0026F: 79.8
Jaccard (Mean): 76.3
semi-supervised-video-object-segmentation-on-13XMem
F: 91.6±0.2
J: 88.0±0.2
Ju0026F: 89.8±0.2
semi-supervised-video-object-segmentation-on-14XMem
F: 91.8±0.4
J: 88.2±0.3
Ju0026F: 90.0±0.4
semi-supervised-video-object-segmentation-on-18XMem (BL30K, MS)
F-Measure (Seen): 89.8
F-Measure (Unseen): 89.9
Jaccard (Seen): 85.5
Jaccard (Unseen): 81.8
Overall: 86.8
semi-supervised-video-object-segmentation-on-18XMem (MS)
F-Measure (Seen): 89.2
F-Measure (Unseen): 89.8
Jaccard (Seen): 84.9
Jaccard (Unseen): 81.8
Overall: 86.4
semi-supervised-video-object-segmentation-on-18XMem
F-Measure (Seen): 88.0
F-Measure (Unseen): 87.1
Jaccard (Seen): 83.6
Jaccard (Unseen): 78.5
Overall: 84.3
semi-supervised-video-object-segmentation-on-18XMem (BL30K)
F-Measure (Seen): 89.2
F-Measure (Unseen): 88.8
Jaccard (Seen): 84.8
Jaccard (Unseen): 80.3
Overall: 85.8
semi-supervised-video-object-segmentation-on-20XMem
FPS: 29.6
semi-supervised-video-object-segmentation-on-21XMem
F: 62.0
J: 53.3
Ju0026F: 57.6
video-object-segmentation-on-davis-2016XMem (BL30K, MS)
F-Score: 94.4
Ju0026F: 93.3
Jaccard (Mean): 92.2
video-object-segmentation-on-davis-2016XMem
F-Score: 92.7
Ju0026F: 91.5
Jaccard (Mean): 90.4
video-object-segmentation-on-davis-2017-testXMem
F-measure: 84.5
Jaccard: 77.4
Mean Jaccard u0026 F-Measure: 81.0
video-object-segmentation-on-davis-2017-testXMem (BL30K, MS)
F-measure: 87.0
Jaccard: 80.5
Mean Jaccard u0026 F-Measure: 83.7
video-object-segmentation-on-davis-2017-valXMem
F-measure: 89.5
Jaccard: 82.9
Mean Jaccard u0026 F-Measure: 86.2
video-object-segmentation-on-davis-2017-valXMem (BLK30K, MS)
F-measure: 92.6
Jaccard: 86.3
Mean Jaccard u0026 F-Measure: 89.5
video-object-segmentation-on-youtube-vosXMem (MS)
F-Measure (Seen): 89.9
F-Measure (Unseen): 89.9
Jaccard (Seen): 85.3
Jaccard (Unseen): 81.7
Overall: 86.7
video-object-segmentation-on-youtube-vosXMem
F-Measure (Seen): 89.3
F-Measure (Unseen): 88.7
Jaccard (Seen): 84.6
Jaccard (Unseen): 80.2
Overall: 85.7
Speed (FPS): 22.6
video-object-segmentation-on-youtube-vosXMem (YouTubeVOS only)
F-Measure (Seen): 88.5
F-Measure (Unseen): 87.2
Jaccard (Seen): 83.7
Jaccard (Unseen): 78.2
Overall: 84.4
Speed (FPS): 22.6
video-object-segmentation-on-youtube-vosXMem (BL30K, MS)
F-Measure (Seen): 90.3
F-Measure (Unseen): 90.2
Jaccard (Seen): 85.6
Jaccard (Unseen): 81.7
Overall: 86.9
video-object-segmentation-on-youtube-vosXMem (BL30K)
F-Measure (Seen): 89.8
F-Measure (Unseen): 89.2
Jaccard (Seen): 85.1
Jaccard (Unseen): 80.3
Overall: 86.1
Speed (FPS): 22.6
video-object-segmentation-on-youtube-vos-1XMem (BL30K, MS)
F-Measure (Seen): 90.3
F-Measure (Unseen): 90.2
Jaccard (Seen): 85.6
Jaccard (Unseen): 81.7
Mean Jaccard u0026 F-Measure: 86.9
video-object-segmentation-on-youtube-vos-2019-2XMem
F-Measure (Seen): 88.6
F-Measure (Unseen): 88.6
Jaccard (Seen): 84.3
Jaccard (Unseen): 80.3
Mean Jaccard u0026 F-Measure: 85.5
video-object-segmentation-on-youtube-vos-2019-2XMem (BL30K,MS)
F-Measure (Seen): 89.8
F-Measure (Unseen): 89.9
Jaccard (Seen): 85.5
Jaccard (Unseen): 81.8
Mean Jaccard u0026 F-Measure: 86.8
visual-object-tracking-on-davis-2016XMem (DAVIS+YouTubeVOS only)
F-measure (Mean): 91.9
Ju0026F: 90.8
Jaccard (Mean): 89.6
Speed (FPS): 29.6
visual-object-tracking-on-davis-2016XMem (MS)
F-measure (Mean): 93.5
Ju0026F: 92.7
Jaccard (Mean): 92.0
visual-object-tracking-on-davis-2016XMem (DAVIS only)
F-measure (Mean): 88.9
Ju0026F: 87.8
Jaccard (Mean): 86.7
Speed (FPS): 29.6
visual-object-tracking-on-davis-2016XMem (BL30K, MS)
F-measure (Mean): 94.4
Ju0026F: 93.3
Jaccard (Mean): 92.2
visual-object-tracking-on-davis-2016XMem (BL30K)
F-measure (Mean): 93.2
Ju0026F: 92.0
Jaccard (Mean): 90.7
Speed (FPS): 29.6
visual-object-tracking-on-davis-2016XMem
F-measure (Mean): 92.7
Ju0026F: 91.5
Jaccard (Mean): 90.4
Speed (FPS): 29.6
visual-object-tracking-on-davis-2017XMem (MS)
F-measure (Mean): 91.0
Ju0026F: 88.2
Jaccard (Mean): 85.4
visual-object-tracking-on-davis-2017XMem (BL30K, MS)
F-measure (Mean): 92.6
Ju0026F: 89.5
Jaccard (Mean): 86.3
visual-object-tracking-on-davis-2017XMem (DAVIS only)
F-measure (Mean): 79.3
Ju0026F: 76.7
Jaccard (Mean): 74.1
Speed (FPS): 22.6
visual-object-tracking-on-davis-2017XMem
F-measure (Mean): 89.5
Ju0026F: 86.2
Jaccard (Mean): 82.9
Speed (FPS): 22.6
visual-object-tracking-on-davis-2017XMem (BL30K)
F-measure (Mean): 91.4
Ju0026F: 87.7
Jaccard (Mean): 84.0
Speed (FPS): 22.6
visual-object-tracking-on-davis-2017XMem (DAVIS and YouTubeVOS only)
F-measure (Mean): 87.6
Ju0026F: 84.5
Jaccard (Mean): 81.4
Speed (FPS): 22.6

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
XMem:基于阿特金森-希夫林记忆模型的长期视频对象分割 | 论文 | HyperAI超神经