HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Cheng Ho Kei ; Schwing Alexander G.

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin
  Memory Model

Abstract

We present XMem, a video object segmentation architecture for long videoswith unified feature memory stores inspired by the Atkinson-Shiffrin memorymodel. Prior work on video object segmentation typically only uses one type offeature memory. For videos longer than a minute, a single feature memory modeltightly links memory consumption and accuracy. In contrast, following theAtkinson-Shiffrin model, we develop an architecture that incorporates multipleindependent yet deeply-connected feature memory stores: a rapidly updatedsensory memory, a high-resolution working memory, and a compact thus sustainedlong-term memory. Crucially, we develop a memory potentiation algorithm thatroutinely consolidates actively used working memory elements into the long-termmemory, which avoids memory explosion and minimizes performance decay forlong-term prediction. Combined with a new memory reading mechanism, XMemgreatly exceeds state-of-the-art performance on long-video datasets while beingon par with state-of-the-art methods (that do not work on long videos) onshort-video datasets. Code is available at https://hkchengrex.github.io/XMem

Code Repositories

hkchengrex/XMem
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
semi-supervised-video-object-segmentation-on-1XMem (BL30K)
F-measure (Mean): 84.7
Ju0026F: 81.2
Jaccard (Mean): 77.6
semi-supervised-video-object-segmentation-on-1XMem (BL30K, MS)
F-measure (Mean): 87.0
Ju0026F: 83.7
Jaccard (Mean): 80.5
semi-supervised-video-object-segmentation-on-1XMem (BL30K, 600p)
F-measure (Mean): 85.8
Ju0026F: 82.5
Jaccard (Mean): 79.1
semi-supervised-video-object-segmentation-on-1XMem (MS)
F-measure (Mean): 86.4
Ju0026F: 83.1
Jaccard (Mean): 79.7
semi-supervised-video-object-segmentation-on-1XMem
F-measure (Mean): 84.5
Ju0026F: 81.0
Jaccard (Mean): 77.4
semi-supervised-video-object-segmentation-on-1XMem (DAVIS and YouTubeVOS only)
F-measure (Mean): 83.4
Ju0026F: 79.8
Jaccard (Mean): 76.3
semi-supervised-video-object-segmentation-on-13XMem
F: 91.6±0.2
J: 88.0±0.2
Ju0026F: 89.8±0.2
semi-supervised-video-object-segmentation-on-14XMem
F: 91.8±0.4
J: 88.2±0.3
Ju0026F: 90.0±0.4
semi-supervised-video-object-segmentation-on-18XMem (BL30K, MS)
F-Measure (Seen): 89.8
F-Measure (Unseen): 89.9
Jaccard (Seen): 85.5
Jaccard (Unseen): 81.8
Overall: 86.8
semi-supervised-video-object-segmentation-on-18XMem (MS)
F-Measure (Seen): 89.2
F-Measure (Unseen): 89.8
Jaccard (Seen): 84.9
Jaccard (Unseen): 81.8
Overall: 86.4
semi-supervised-video-object-segmentation-on-18XMem
F-Measure (Seen): 88.0
F-Measure (Unseen): 87.1
Jaccard (Seen): 83.6
Jaccard (Unseen): 78.5
Overall: 84.3
semi-supervised-video-object-segmentation-on-18XMem (BL30K)
F-Measure (Seen): 89.2
F-Measure (Unseen): 88.8
Jaccard (Seen): 84.8
Jaccard (Unseen): 80.3
Overall: 85.8
semi-supervised-video-object-segmentation-on-20XMem
FPS: 29.6
semi-supervised-video-object-segmentation-on-21XMem
F: 62.0
J: 53.3
Ju0026F: 57.6
video-object-segmentation-on-davis-2016XMem (BL30K, MS)
F-Score: 94.4
Ju0026F: 93.3
Jaccard (Mean): 92.2
video-object-segmentation-on-davis-2016XMem
F-Score: 92.7
Ju0026F: 91.5
Jaccard (Mean): 90.4
video-object-segmentation-on-davis-2017-testXMem
F-measure: 84.5
Jaccard: 77.4
Mean Jaccard u0026 F-Measure: 81.0
video-object-segmentation-on-davis-2017-testXMem (BL30K, MS)
F-measure: 87.0
Jaccard: 80.5
Mean Jaccard u0026 F-Measure: 83.7
video-object-segmentation-on-davis-2017-valXMem
F-measure: 89.5
Jaccard: 82.9
Mean Jaccard u0026 F-Measure: 86.2
video-object-segmentation-on-davis-2017-valXMem (BLK30K, MS)
F-measure: 92.6
Jaccard: 86.3
Mean Jaccard u0026 F-Measure: 89.5
video-object-segmentation-on-youtube-vosXMem (MS)
F-Measure (Seen): 89.9
F-Measure (Unseen): 89.9
Jaccard (Seen): 85.3
Jaccard (Unseen): 81.7
Overall: 86.7
video-object-segmentation-on-youtube-vosXMem
F-Measure (Seen): 89.3
F-Measure (Unseen): 88.7
Jaccard (Seen): 84.6
Jaccard (Unseen): 80.2
Overall: 85.7
Speed (FPS): 22.6
video-object-segmentation-on-youtube-vosXMem (YouTubeVOS only)
F-Measure (Seen): 88.5
F-Measure (Unseen): 87.2
Jaccard (Seen): 83.7
Jaccard (Unseen): 78.2
Overall: 84.4
Speed (FPS): 22.6
video-object-segmentation-on-youtube-vosXMem (BL30K, MS)
F-Measure (Seen): 90.3
F-Measure (Unseen): 90.2
Jaccard (Seen): 85.6
Jaccard (Unseen): 81.7
Overall: 86.9
video-object-segmentation-on-youtube-vosXMem (BL30K)
F-Measure (Seen): 89.8
F-Measure (Unseen): 89.2
Jaccard (Seen): 85.1
Jaccard (Unseen): 80.3
Overall: 86.1
Speed (FPS): 22.6
video-object-segmentation-on-youtube-vos-1XMem (BL30K, MS)
F-Measure (Seen): 90.3
F-Measure (Unseen): 90.2
Jaccard (Seen): 85.6
Jaccard (Unseen): 81.7
Mean Jaccard u0026 F-Measure: 86.9
video-object-segmentation-on-youtube-vos-2019-2XMem
F-Measure (Seen): 88.6
F-Measure (Unseen): 88.6
Jaccard (Seen): 84.3
Jaccard (Unseen): 80.3
Mean Jaccard u0026 F-Measure: 85.5
video-object-segmentation-on-youtube-vos-2019-2XMem (BL30K,MS)
F-Measure (Seen): 89.8
F-Measure (Unseen): 89.9
Jaccard (Seen): 85.5
Jaccard (Unseen): 81.8
Mean Jaccard u0026 F-Measure: 86.8
visual-object-tracking-on-davis-2016XMem (DAVIS+YouTubeVOS only)
F-measure (Mean): 91.9
Ju0026F: 90.8
Jaccard (Mean): 89.6
Speed (FPS): 29.6
visual-object-tracking-on-davis-2016XMem (MS)
F-measure (Mean): 93.5
Ju0026F: 92.7
Jaccard (Mean): 92.0
visual-object-tracking-on-davis-2016XMem (DAVIS only)
F-measure (Mean): 88.9
Ju0026F: 87.8
Jaccard (Mean): 86.7
Speed (FPS): 29.6
visual-object-tracking-on-davis-2016XMem (BL30K, MS)
F-measure (Mean): 94.4
Ju0026F: 93.3
Jaccard (Mean): 92.2
visual-object-tracking-on-davis-2016XMem (BL30K)
F-measure (Mean): 93.2
Ju0026F: 92.0
Jaccard (Mean): 90.7
Speed (FPS): 29.6
visual-object-tracking-on-davis-2016XMem
F-measure (Mean): 92.7
Ju0026F: 91.5
Jaccard (Mean): 90.4
Speed (FPS): 29.6
visual-object-tracking-on-davis-2017XMem (MS)
F-measure (Mean): 91.0
Ju0026F: 88.2
Jaccard (Mean): 85.4
visual-object-tracking-on-davis-2017XMem (BL30K, MS)
F-measure (Mean): 92.6
Ju0026F: 89.5
Jaccard (Mean): 86.3
visual-object-tracking-on-davis-2017XMem (DAVIS only)
F-measure (Mean): 79.3
Ju0026F: 76.7
Jaccard (Mean): 74.1
Speed (FPS): 22.6
visual-object-tracking-on-davis-2017XMem
F-measure (Mean): 89.5
Ju0026F: 86.2
Jaccard (Mean): 82.9
Speed (FPS): 22.6
visual-object-tracking-on-davis-2017XMem (BL30K)
F-measure (Mean): 91.4
Ju0026F: 87.7
Jaccard (Mean): 84.0
Speed (FPS): 22.6
visual-object-tracking-on-davis-2017XMem (DAVIS and YouTubeVOS only)
F-measure (Mean): 87.6
Ju0026F: 84.5
Jaccard (Mean): 81.4
Speed (FPS): 22.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model | Papers | HyperAI