HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Zongxin Yang; Yi Yang

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Abstract

This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach. Firstly, DeAOT decouples the hierarchical propagation of object-agnostic and object-specific embeddings by handling them in two independent branches. Secondly, to compensate for the additional computation from dual-branch propagation, we propose an efficient module for constructing hierarchical propagation, i.e., Gated Propagation Module, which is carefully designed with single-head attention. Extensive experiments show that DeAOT significantly outperforms AOT in both accuracy and efficiency. On YouTube-VOS, DeAOT can achieve 86.0% at 22.4fps and 82.0% at 53.4fps. Without test-time augmentations, we achieve new state-of-the-art performance on four benchmarks, i.e., YouTube-VOS (86.2%), DAVIS 2017 (86.2%), DAVIS 2016 (92.9%), and VOT 2020 (0.622). Project page: https://github.com/z-x-yang/AOT.

Code Repositories

yoxu515/aot-benchmark
pytorch
Mentioned in GitHub
z-x-yang/AOT
Official
paddle
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
semi-supervised-video-object-segmentation-on-1DeAOT-S
F-measure (Mean): 79.0
FPS: 49.2
Ju0026F: 75.4
Jaccard (Mean): 71.9
semi-supervised-video-object-segmentation-on-1DeAOT-B
F-measure (Mean): 79.9
FPS: 40.9
Ju0026F: 76.2
Jaccard (Mean): 72.5
semi-supervised-video-object-segmentation-on-1DeAOT-L
F-measure (Mean): 81.7
FPS: 28.5
Ju0026F: 77.9
Jaccard (Mean): 74.1
semi-supervised-video-object-segmentation-on-1DeAOT-T
F-measure (Mean): 77.3
FPS: 63.5
Ju0026F: 73.7
Jaccard (Mean): 70.0
semi-supervised-video-object-segmentation-on-1R50-DeAOT-L
F-measure (Mean): 84.5
FPS: 27.0
Ju0026F: 80.7
Jaccard (Mean): 76.9
semi-supervised-video-object-segmentation-on-1SwinB-DeAOT-L
F-measure (Mean): 86.7
FPS: 15.4
Ju0026F: 82.8
Jaccard (Mean): 78.9
semi-supervised-video-object-segmentation-on-15SwinB-DeAOT-L
EAO: 0.622
EAO (real-time): 0.559
semi-supervised-video-object-segmentation-on-15R50-DeAOT-L
EAO: 0.613
EAO (real-time): 0.571
semi-supervised-video-object-segmentation-on-15DeAOT-B
EAO: 0.571
EAO (real-time): 0.542
semi-supervised-video-object-segmentation-on-15DeAOT-L
EAO: 0.591
EAO (real-time): 0.554
semi-supervised-video-object-segmentation-on-15DeAOT-T
EAO: 0.472
EAO (real-time): 0.463
semi-supervised-video-object-segmentation-on-15DeAOT-S
EAO: 0.593
EAO (real-time): 0.559
semi-supervised-video-object-segmentation-on-18DeAOT-B
F-Measure (Seen): 88.3
F-Measure (Unseen): 87.5
FPS: 30.4
Jaccard (Seen): 83.5
Jaccard (Unseen): 79.1
Overall: 84.6
semi-supervised-video-object-segmentation-on-18DeAOT-L
F-Measure (Seen): 88.8
F-Measure (Unseen): 87.2
FPS: 24.7
Jaccard (Seen): 83.8
Jaccard (Unseen): 79.0
Overall: 84.7
semi-supervised-video-object-segmentation-on-18R50-DeAOT-L
F-Measure (Seen): 89.4
F-Measure (Unseen): 88.9
FPS: 22.4
Jaccard (Seen): 84.6
Jaccard (Unseen): 80.8
Overall: 85.9
semi-supervised-video-object-segmentation-on-18SwinB-DeAOT-L
F-Measure (Seen): 90.2
F-Measure (Unseen): 88.6
FPS: 11.9
Jaccard (Seen): 85.3
Jaccard (Unseen): 80.4
Overall: 86.1
semi-supervised-video-object-segmentation-on-18DeAOT-S
F-Measure (Seen): 87.5
F-Measure (Unseen): 86.8
FPS: 38.7
Jaccard (Seen): 82.8
Jaccard (Unseen): 78.1
Overall: 83.8
semi-supervised-video-object-segmentation-on-18DeAOT-T
F-Measure (Seen): 85.6
F-Measure (Unseen): 84.7
FPS: 53.4
Jaccard (Seen): 81.2
Jaccard (Unseen): 76.4
Overall: 82.0
semi-supervised-video-object-segmentation-on-21DeAOT
F: 63.8
J: 55.1
Ju0026F: 59.4
video-object-segmentation-on-youtube-vosR50-DeAOT-L
F-Measure (Seen): 89.9
F-Measure (Unseen): 88.7
Jaccard (Seen): 84.9
Jaccard (Unseen): 80.4
Overall: 86.0
Params(M): 19.8
Speed (FPS): 22.4
video-object-segmentation-on-youtube-vosDeAOT-L
F-Measure (Seen): 89.4
Jaccard (Seen): 84.2
Jaccard (Unseen): 78.6
Overall: 84.8
Speed (FPS): 24.7
video-object-segmentation-on-youtube-vosSwinB-DeAOT-L
F-Measure (Seen): 90.6
F-Measure (Unseen): 88.4
Jaccard (Seen): 85.6
Jaccard (Unseen): 80.0
Overall: 86.2
Params(M): 70.3
Speed (FPS): 11.9
video-object-segmentation-on-youtube-vosDeAOT-S
F-Measure (Seen): 88.3
F-Measure (Unseen): 86.6
Jaccard (Seen): 83.3
Jaccard (Unseen): 77.9
Overall: 84.0
Params(M): 10.2
Speed (FPS): 38.7
video-object-segmentation-on-youtube-vosDeAOT-B
F-Measure (Seen): 88.9
F-Measure (Unseen): 87.0
Jaccard (Seen): 83.9
Jaccard (Unseen): 78.5
Overall: 84.6
Params(M): 13.2
Speed (FPS): 30.4
video-object-segmentation-on-youtube-vosDeAOT-T
F-Measure (Seen): 86.3
F-Measure (Unseen): 84.2
Jaccard (Seen): 81.6
Jaccard (Unseen): 75.8
Overall: 82.0
Params(M): 7.2
Speed (FPS): 53.4
visual-object-tracking-on-davis-2016DeAOT-B
F-measure (Mean): 92.5
Ju0026F: 91.0
Jaccard (Mean): 89.4
Speed (FPS): 40.9
visual-object-tracking-on-davis-2016DeAOT-L
F-measure (Mean): 93.7
Ju0026F: 92.0
Jaccard (Mean): 90.3
Speed (FPS): 28.5
visual-object-tracking-on-davis-2016SwinB-DeAOT-L
F-measure (Mean): 94.7
Ju0026F: 92.9
Jaccard (Mean): 91.1
Speed (FPS): 15.4
visual-object-tracking-on-davis-2016DeAOT-T
F-measure (Mean): 89.9
Ju0026F: 88.9
Jaccard (Mean): 87.8
Speed (FPS): 63.5
visual-object-tracking-on-davis-2016R50-DeAOT-L
F-measure (Mean): 94.0
Ju0026F: 92.3
Jaccard (Mean): 90.5
Speed (FPS): 27.0
visual-object-tracking-on-davis-2016DeAOT-S
F-measure (Mean): 90.9
Ju0026F: 89.3
Jaccard (Mean): 87.6
Speed (FPS): 49.2
visual-object-tracking-on-davis-2017DeAOT-S
F-measure (Mean): 83.8
Ju0026F: 80.8
Jaccard (Mean): 77.8
Params(M): 10.2
Speed (FPS): 49.2
visual-object-tracking-on-davis-2017DeAOT-L
F-measure (Mean): 87.1
Ju0026F: 84.1
Jaccard (Mean): 81.0
Params(M): 13.2
Speed (FPS): 28.5
visual-object-tracking-on-davis-2017SwinB-DeAOT-L
F-measure (Mean): 89.2
Ju0026F: 86.2
Jaccard (Mean): 83.1
Params(M): 70.3
Speed (FPS): 15.4
visual-object-tracking-on-davis-2017DeAOT-T
F-measure (Mean): 83.3
Ju0026F: 80.5
Jaccard (Mean): 77.7
Params(M): 7.2
Speed (FPS): 63.5
visual-object-tracking-on-davis-2017R50-DeAOT-L
F-measure (Mean): 88.2
Ju0026F: 85.2
Jaccard (Mean): 82.2
Params(M): 19.8
Speed (FPS): 27.0
visual-object-tracking-on-davis-2017DeAOT-B
F-measure (Mean): 85.1
Ju0026F: 82.2
Jaccard (Mean): 79.2
Params(M): 13.2
Speed (FPS): 40.9

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp