4 个月前

层次传播中特征解耦在视频对象分割中的应用

层次传播中特征解耦在视频对象分割中的应用

摘要

本文专注于开发一种更为有效的层次传播方法,用于半监督视频对象分割(VOS)。基于视觉变换器,最近提出的“利用变换器关联对象”(AOT)方法将层次传播引入VOS,并展示了令人鼓舞的结果。层次传播可以逐步从过去的帧中传播信息到当前帧,并将当前帧的特征从对象无关转变为对象特定。然而,随着对象特定信息的增加,深度传播层中不可避免地会出现对象无关视觉信息的丢失。为了解决这一问题并进一步促进视觉嵌入的学习,本文提出了一种“在层次传播中解耦特征”(DeAOT)的方法。首先,DeAOT通过在两个独立的分支中分别处理对象无关和对象特定的嵌入来解耦层次传播。其次,为了补偿双分支传播带来的额外计算开销,我们设计了一个高效的模块来构建层次传播,即门控传播模块(Gated Propagation Module),该模块精心设计了单头注意力机制。大量实验表明,DeAOT在准确性和效率方面显著优于AOT。在YouTube-VOS数据集上,DeAOT可以分别以22.4帧/秒的速度达到86.0%的准确率和以53.4帧/秒的速度达到82.0%的准确率。无需测试时增强的情况下,我们在四个基准测试中取得了新的最先进性能,分别是YouTube-VOS(86.2%)、DAVIS 2017(86.2%)、DAVIS 2016(92.9%)和VOT 2020(0.622)。项目页面:https://github.com/z-x-yang/AOT。

代码仓库

yoxu515/aot-benchmark
pytorch
GitHub 中提及
z-x-yang/AOT
官方
paddle
GitHub 中提及

基准测试

基准方法指标
semi-supervised-video-object-segmentation-on-1DeAOT-S
F-measure (Mean): 79.0
FPS: 49.2
Ju0026F: 75.4
Jaccard (Mean): 71.9
semi-supervised-video-object-segmentation-on-1DeAOT-B
F-measure (Mean): 79.9
FPS: 40.9
Ju0026F: 76.2
Jaccard (Mean): 72.5
semi-supervised-video-object-segmentation-on-1DeAOT-L
F-measure (Mean): 81.7
FPS: 28.5
Ju0026F: 77.9
Jaccard (Mean): 74.1
semi-supervised-video-object-segmentation-on-1DeAOT-T
F-measure (Mean): 77.3
FPS: 63.5
Ju0026F: 73.7
Jaccard (Mean): 70.0
semi-supervised-video-object-segmentation-on-1R50-DeAOT-L
F-measure (Mean): 84.5
FPS: 27.0
Ju0026F: 80.7
Jaccard (Mean): 76.9
semi-supervised-video-object-segmentation-on-1SwinB-DeAOT-L
F-measure (Mean): 86.7
FPS: 15.4
Ju0026F: 82.8
Jaccard (Mean): 78.9
semi-supervised-video-object-segmentation-on-15SwinB-DeAOT-L
EAO: 0.622
EAO (real-time): 0.559
semi-supervised-video-object-segmentation-on-15R50-DeAOT-L
EAO: 0.613
EAO (real-time): 0.571
semi-supervised-video-object-segmentation-on-15DeAOT-B
EAO: 0.571
EAO (real-time): 0.542
semi-supervised-video-object-segmentation-on-15DeAOT-L
EAO: 0.591
EAO (real-time): 0.554
semi-supervised-video-object-segmentation-on-15DeAOT-T
EAO: 0.472
EAO (real-time): 0.463
semi-supervised-video-object-segmentation-on-15DeAOT-S
EAO: 0.593
EAO (real-time): 0.559
semi-supervised-video-object-segmentation-on-18DeAOT-B
F-Measure (Seen): 88.3
F-Measure (Unseen): 87.5
FPS: 30.4
Jaccard (Seen): 83.5
Jaccard (Unseen): 79.1
Overall: 84.6
semi-supervised-video-object-segmentation-on-18DeAOT-L
F-Measure (Seen): 88.8
F-Measure (Unseen): 87.2
FPS: 24.7
Jaccard (Seen): 83.8
Jaccard (Unseen): 79.0
Overall: 84.7
semi-supervised-video-object-segmentation-on-18R50-DeAOT-L
F-Measure (Seen): 89.4
F-Measure (Unseen): 88.9
FPS: 22.4
Jaccard (Seen): 84.6
Jaccard (Unseen): 80.8
Overall: 85.9
semi-supervised-video-object-segmentation-on-18SwinB-DeAOT-L
F-Measure (Seen): 90.2
F-Measure (Unseen): 88.6
FPS: 11.9
Jaccard (Seen): 85.3
Jaccard (Unseen): 80.4
Overall: 86.1
semi-supervised-video-object-segmentation-on-18DeAOT-S
F-Measure (Seen): 87.5
F-Measure (Unseen): 86.8
FPS: 38.7
Jaccard (Seen): 82.8
Jaccard (Unseen): 78.1
Overall: 83.8
semi-supervised-video-object-segmentation-on-18DeAOT-T
F-Measure (Seen): 85.6
F-Measure (Unseen): 84.7
FPS: 53.4
Jaccard (Seen): 81.2
Jaccard (Unseen): 76.4
Overall: 82.0
semi-supervised-video-object-segmentation-on-21DeAOT
F: 63.8
J: 55.1
Ju0026F: 59.4
video-object-segmentation-on-youtube-vosR50-DeAOT-L
F-Measure (Seen): 89.9
F-Measure (Unseen): 88.7
Jaccard (Seen): 84.9
Jaccard (Unseen): 80.4
Overall: 86.0
Params(M): 19.8
Speed (FPS): 22.4
video-object-segmentation-on-youtube-vosDeAOT-L
F-Measure (Seen): 89.4
Jaccard (Seen): 84.2
Jaccard (Unseen): 78.6
Overall: 84.8
Speed (FPS): 24.7
video-object-segmentation-on-youtube-vosSwinB-DeAOT-L
F-Measure (Seen): 90.6
F-Measure (Unseen): 88.4
Jaccard (Seen): 85.6
Jaccard (Unseen): 80.0
Overall: 86.2
Params(M): 70.3
Speed (FPS): 11.9
video-object-segmentation-on-youtube-vosDeAOT-S
F-Measure (Seen): 88.3
F-Measure (Unseen): 86.6
Jaccard (Seen): 83.3
Jaccard (Unseen): 77.9
Overall: 84.0
Params(M): 10.2
Speed (FPS): 38.7
video-object-segmentation-on-youtube-vosDeAOT-B
F-Measure (Seen): 88.9
F-Measure (Unseen): 87.0
Jaccard (Seen): 83.9
Jaccard (Unseen): 78.5
Overall: 84.6
Params(M): 13.2
Speed (FPS): 30.4
video-object-segmentation-on-youtube-vosDeAOT-T
F-Measure (Seen): 86.3
F-Measure (Unseen): 84.2
Jaccard (Seen): 81.6
Jaccard (Unseen): 75.8
Overall: 82.0
Params(M): 7.2
Speed (FPS): 53.4
visual-object-tracking-on-davis-2016DeAOT-B
F-measure (Mean): 92.5
Ju0026F: 91.0
Jaccard (Mean): 89.4
Speed (FPS): 40.9
visual-object-tracking-on-davis-2016DeAOT-L
F-measure (Mean): 93.7
Ju0026F: 92.0
Jaccard (Mean): 90.3
Speed (FPS): 28.5
visual-object-tracking-on-davis-2016SwinB-DeAOT-L
F-measure (Mean): 94.7
Ju0026F: 92.9
Jaccard (Mean): 91.1
Speed (FPS): 15.4
visual-object-tracking-on-davis-2016DeAOT-T
F-measure (Mean): 89.9
Ju0026F: 88.9
Jaccard (Mean): 87.8
Speed (FPS): 63.5
visual-object-tracking-on-davis-2016R50-DeAOT-L
F-measure (Mean): 94.0
Ju0026F: 92.3
Jaccard (Mean): 90.5
Speed (FPS): 27.0
visual-object-tracking-on-davis-2016DeAOT-S
F-measure (Mean): 90.9
Ju0026F: 89.3
Jaccard (Mean): 87.6
Speed (FPS): 49.2
visual-object-tracking-on-davis-2017DeAOT-S
F-measure (Mean): 83.8
Ju0026F: 80.8
Jaccard (Mean): 77.8
Params(M): 10.2
Speed (FPS): 49.2
visual-object-tracking-on-davis-2017DeAOT-L
F-measure (Mean): 87.1
Ju0026F: 84.1
Jaccard (Mean): 81.0
Params(M): 13.2
Speed (FPS): 28.5
visual-object-tracking-on-davis-2017SwinB-DeAOT-L
F-measure (Mean): 89.2
Ju0026F: 86.2
Jaccard (Mean): 83.1
Params(M): 70.3
Speed (FPS): 15.4
visual-object-tracking-on-davis-2017DeAOT-T
F-measure (Mean): 83.3
Ju0026F: 80.5
Jaccard (Mean): 77.7
Params(M): 7.2
Speed (FPS): 63.5
visual-object-tracking-on-davis-2017R50-DeAOT-L
F-measure (Mean): 88.2
Ju0026F: 85.2
Jaccard (Mean): 82.2
Params(M): 19.8
Speed (FPS): 27.0
visual-object-tracking-on-davis-2017DeAOT-B
F-measure (Mean): 85.1
Ju0026F: 82.2
Jaccard (Mean): 79.2
Params(M): 13.2
Speed (FPS): 40.9

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
层次传播中特征解耦在视频对象分割中的应用 | 论文 | HyperAI超神经