4 个月前

基于Transformer关联对象的视频目标分割

基于Transformer关联对象的视频目标分割

摘要

本文研究了如何实现更好且更高效的嵌入学习,以应对在具有挑战性的多目标场景下的半监督视频对象分割问题。现有的最先进方法通过解码单个正样本对象的特征来学习,因此在多目标场景下需要分别匹配和分割每个目标,消耗大量的计算资源。为了解决这一问题,我们提出了一种基于变压器的对象关联(AOT)方法,可以统一地匹配和解码多个对象。具体而言,AOT采用了一种识别机制,将多个目标关联到同一个高维嵌入空间中。因此,我们可以像处理单个对象一样高效地同时处理多个对象的匹配和分割解码。为了充分建模多目标关联,设计了一种长短期变压器(Long Short-Term Transformer),用于构建分层匹配和传播。我们在多目标和单目标基准上进行了广泛的实验,以检验不同复杂度的AOT变体网络。特别是,我们的R50-AOT-L在三个流行的基准测试中均优于所有现有最先进方法,即YouTube-VOS(84.1% J&F)、DAVIS 2017(84.9%)和DAVIS 2016(91.1%),同时保持超过3倍的多目标运行速度。此外,我们的AOT-T能够在上述基准测试中维持实时多目标处理速度。基于AOT方法,我们在第三届大规模VOS挑战赛中获得了第一名。

代码仓库

yoxu515/aot-benchmark
pytorch
GitHub 中提及
z-x-yang/AOT
官方
paddle
GitHub 中提及

基准测试

基准方法指标
semi-supervised-video-object-segmentation-on-1SwinB-AOT-L
F-measure (Mean): 85.1
FPS: 12.1
Ju0026F: 81.2
Jaccard (Mean): 77.3
semi-supervised-video-object-segmentation-on-1AOT-L
F-measure (Mean): 82.3
FPS: 18.7
Ju0026F: 78.3
Jaccard (Mean): 74.3
semi-supervised-video-object-segmentation-on-1AOT-T
F-measure (Mean): 75.7
FPS: 51.4
Ju0026F: 72.0
Jaccard (Mean): 68.3
semi-supervised-video-object-segmentation-on-1AOT-S
F-measure (Mean): 77.5
FPS: 40.0
Ju0026F: 73.9
Jaccard (Mean): 70.3
semi-supervised-video-object-segmentation-on-1AOT-B
F-measure (Mean): 79.3
FPS: 29.6
Ju0026F: 75.5
Jaccard (Mean): 71.6
semi-supervised-video-object-segmentation-on-1R50-AOT-L
F-measure (Mean): 83.3
FPS: 18.0
Ju0026F: 79.6
Jaccard (Mean): 75.9
semi-supervised-video-object-segmentation-on-15SwinB-AOT-L
EAO: 0.586
EAO (real-time): 0.523
semi-supervised-video-object-segmentation-on-15AOT-S
EAO: 0.512
EAO (real-time): 0.499
semi-supervised-video-object-segmentation-on-15AOT-B
EAO: 0.541
EAO (real-time): 0.533
semi-supervised-video-object-segmentation-on-15AOT-L
EAO: 0.574
EAO (real-time): 0.560
semi-supervised-video-object-segmentation-on-15R50-AOT-L
EAO: 0.569
EAO (real-time): 0.540
semi-supervised-video-object-segmentation-on-15AOT-T
EAO: 0.435
EAO (real-time): 0.433
semi-supervised-video-object-segmentation-on-20AOT-S
D17 val (F): 82.0
D17 val (G): 79.2
D17 val (J): 76.4
FPS: 40.0
semi-supervised-video-object-segmentation-on-21AOT
F: 61.3
J: 53.1
Ju0026F: 57.2
video-object-segmentation-on-davis-2017-test-1AOT
F-measure: 83.3
Jaccard: 75.9
Mean Jaccard u0026 F-Measure: 79.6
video-object-segmentation-on-youtube-vosAOT-T (all frames)
F-Measure (Seen): 84.7
F-Measure (Unseen): 83.5
Jaccard (Seen): 80.0
Jaccard (Unseen): 75.2
Overall: 80.9
Params(M): 5.3
Speed (FPS): 41.0
video-object-segmentation-on-youtube-vosR50-AOT-L (all frames)
F-Measure (Seen): 89.5
F-Measure (Unseen): 88.2
Jaccard (Seen): 84.5
Jaccard (Unseen): 79.6
Overall: 85.5
Params(M): 14.9
Speed (FPS): 6.4
video-object-segmentation-on-youtube-vosAOT-B (all frames)
F-Measure (Seen): 88.5
F-Measure (Unseen): 86.5
Jaccard (Seen): 83.6
Jaccard (Unseen): 78.0
Overall: 84.1
Params(M): 8.3
Speed (FPS): 20.5
video-object-segmentation-on-youtube-vosAOT-B
F-Measure (Seen): 87.5
F-Measure (Unseen): 86.0
Jaccard (Seen): 82.6
Jaccard (Unseen): 77.7
Overall: 83.5
Params(M): 8.3
Speed (FPS): 20.5
video-object-segmentation-on-youtube-vosAOT-S (all frames)
F-Measure (Seen): 87.0
F-Measure (Unseen): 85.7
Jaccard (Seen): 82.2
Jaccard (Unseen): 77.3
Overall: 83.0
Params(M): 7.9
Speed (FPS): 27.1
video-object-segmentation-on-youtube-vosAOT-S
F-Measure (Seen): 86.7
F-Measure (Unseen): 85.0
Jaccard (Seen): 82.0
Jaccard (Unseen): 76.6
Overall: 82.6
Params(M): 7.9
Speed (FPS): 27.1
video-object-segmentation-on-youtube-vosR50-AOT-L
F-Measure (Seen): 88.5
F-Measure (Unseen): 86.1
Jaccard (Seen): 83.7
Jaccard (Unseen): 78.1
Overall: 84.1
Params(M): 14.9
Speed (FPS): 14.9
video-object-segmentation-on-youtube-vosSwinB-AOT-L
F-Measure (Seen): 89.3
F-Measure (Unseen): 86.4
Jaccard (Seen): 84.3
Jaccard (Unseen): 77.9
Overall: 84.5
Params(M): 65.4
Speed (FPS): 9.3
video-object-segmentation-on-youtube-vosSwinB-AOT-L (all frames)
F-Measure (Seen): 90.1
F-Measure (Unseen): 86.9
Jaccard (Seen): 85.1
Jaccard (Unseen): 78.4
Overall: 85.1
Params(M): 65.4
Speed (FPS): 5.2
video-object-segmentation-on-youtube-vosAOT-L (all frames)
F-Measure (Seen): 88.8
F-Measure (Unseen): 87.1
Jaccard (Seen): 83.7
Jaccard (Unseen): 78.4
Overall: 84.5
Params(M): 8.3
Speed (FPS): 6.5
video-object-segmentation-on-youtube-vosAOT-T
F-Measure (Seen): 84.5
F-Measure (Unseen): 82.2
Jaccard (Seen): 80.1
Jaccard (Unseen): 74.0
Overall: 80.2
Params(M): 5.3
Speed (FPS): 41.0
video-object-segmentation-on-youtube-vosAOT-L
F-Measure (Seen): 87.9
F-Measure (Unseen): 86.5
Jaccard (Seen): 82.9
Jaccard (Unseen): 77.7
Overall: 83.8
Params(M): 8.3
Speed (FPS): 16.0
video-object-segmentation-on-youtube-vos-2019-2AOT
F-Measure (Seen): 88.1
F-Measure (Unseen): 86.3
Jaccard (Seen): 83.5
Jaccard (Unseen): 78.4
Mean Jaccard u0026 F-Measure: 84.1
visual-object-tracking-on-davis-2016SwinB-AOT-L
F-measure (Mean): 93.3
Ju0026F: 92.0
Jaccard (Mean): 90.7
Speed (FPS): 12.1
visual-object-tracking-on-davis-2016AOT-L
F-measure (Mean): 91.1
Ju0026F: 90.4
Jaccard (Mean): 89.6
Speed (FPS): 18.7
visual-object-tracking-on-davis-2016AOT-L
F-measure (Mean): 91.1
Ju0026F: 89.9
Jaccard (Mean): 88.7
Speed (FPS): 29.6
visual-object-tracking-on-davis-2016R50-AOT-L
F-measure (Mean): 92.1
Ju0026F: 91.1
Jaccard (Mean): 90.1
Speed (FPS): 18.0
visual-object-tracking-on-davis-2016AOT-S
F-measure (Mean): 90.2
Ju0026F: 89.4
Jaccard (Mean): 88.6
Speed (FPS): 40.0
visual-object-tracking-on-davis-2016AOT-T
F-measure (Mean): 87.4
Ju0026F: 86.8
Jaccard (Mean): 86.1
Speed (FPS): 51.4
visual-object-tracking-on-davis-2017AOT-S
F-measure (Mean): 83.9
Ju0026F: 81.3
Jaccard (Mean): 78.7
Params(M): 7.0
Speed (FPS): 40.0
visual-object-tracking-on-davis-2017SwinB-AOT-L
F-measure (Mean): 88.4
Ju0026F: 85.4
Jaccard (Mean): 82.4
Params(M): 65.4
Speed (FPS): 12.1
visual-object-tracking-on-davis-2017R50-AOT-L
F-measure (Mean): 87.5
Ju0026F: 84.9
Jaccard (Mean): 82.3
Params(M): 14.9
Speed (FPS): 18.0
visual-object-tracking-on-davis-2017AOT-T
F-measure (Mean): 82.3
Ju0026F: 79.9
Jaccard (Mean): 77.4
Params(M): 5.7
Speed (FPS): 51.4
visual-object-tracking-on-davis-2017AOT-L
F-measure (Mean): 86.4
Ju0026F: 83.8
Jaccard (Mean): 81.1
Params(M): 8.3
Speed (FPS): 18.7
visual-object-tracking-on-davis-2017AOT-B
F-measure (Mean): 85.2
Ju0026F: 82.5
Jaccard (Mean): 79.7
Params(M): 8.3
Speed (FPS): 29.6
visual-object-tracking-on-vot2022MS_AOT
EAO: 0.673

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
基于Transformer关联对象的视频目标分割 | 论文 | HyperAI超神经