Command Palette
Search for a command to run...
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Cheng Ho Kei ; Tai Yu-Wing ; Tang Chi-Keung

Abstract
We present Modular interactive VOS (MiVOS) framework which decouplesinteraction-to-mask and mask propagation, allowing for higher generalizabilityand better performance. Trained separately, the interaction module convertsuser interactions to an object mask, which is then temporally propagated by ourpropagation module using a novel top-$k$ filtering strategy in reading thespace-time memory. To effectively take the user's intent into account, a noveldifference-aware module is proposed to learn how to properly fuse the masksbefore and after each interaction, which are aligned with the target frames byemploying the space-time memory. We evaluate our method both qualitatively andquantitatively with different forms of user interactions (e.g., scribbles,clicks) on DAVIS to show that our method outperforms current state-of-the-artalgorithms while requiring fewer frame interactions, with the additionaladvantage in generalizing to different types of user interactions. Wecontribute a large-scale synthetic VOS dataset with pixel-accurate segmentationof 4.8M frames to accompany our source codes to facilitate future research.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| interactive-video-object-segmentation-on | MiVOS | AUC-J: 0.849 AUC-Ju0026F: 0.879 Ju0026F@60s: 0.885 J@60s: 0.854 |
| semi-supervised-video-object-segmentation-on-1 | MiVOS | F-measure (Decay): 14.5 F-measure (Mean): 80.2 F-measure (Recall): 87.6 Ju0026F: 76.5 Jaccard (Decay): 14.9 Jaccard (Mean): 72.7 Jaccard (Recall): 81.2 |
| video-object-segmentation-on-youtube-vos | MiVOS | F-Measure (Seen): 84.7 F-Measure (Unseen): 85.5 Jaccard (Seen): 80.6 Jaccard (Unseen): 77.3 Overall: 82.0 |
| visual-object-tracking-on-davis-2016 | MiVOS | F-measure (Decay): 5.1 F-measure (Mean): 92.4 F-measure (Recall): 96.4 Ju0026F: 91.0 Jaccard (Decay): 6.6 Jaccard (Mean): 89.7 Jaccard (Recall): 97.5 Speed (FPS): 16.9 |
| visual-object-tracking-on-davis-2017 | MiVOS | F-measure (Decay): 8.2 F-measure (Mean): 87.4 F-measure (Recall): 93.1 Ju0026F: 84.5 Jaccard (Decay): 7.0 Jaccard (Mean): 81.7 Jaccard (Recall): 90.9 Speed (FPS): 11.2 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.