4 个月前

BATMAN:运动-外观邻域空间中的双边注意力变换器用于视频目标分割

BATMAN:运动-外观邻域空间中的双边注意力变换器用于视频目标分割

摘要

视频对象分割(VOS)是视频理解的基础。基于Transformer的方法在半监督VOS中表现出显著的性能提升。然而,现有方法在分割视觉上相似且相互靠近的对象时仍面临挑战。本文提出了一种新颖的运动-外观邻域空间双边注意Transformer(BATMAN),用于半监督VOS。该方法通过一个创新的光流校准模块捕捉视频中的物体运动,该模块将分割掩码与光流估计融合,以提高物体内部的光流平滑度并减少物体边界处的噪声。经过校准的光流随后被应用于我们提出的新型双边注意机制中,该机制在考虑运动和外观的情况下计算查询帧和参考帧之间的对应关系。大量实验验证了BATMAN架构的有效性,其在四个流行的VOS基准测试中均超越了所有现有的最先进方法:Youtube-VOS 2019(85.0%)、Youtube-VOS 2018(85.3%)、DAVIS 2017 Val/Testdev(86.2%/82.2%)和DAVIS 2016(92.5%)。

基准测试

基准方法指标
video-object-segmentation-on-davis-2016KMN (val)
F-Score: 91.5
Ju0026F: 90.5
Jaccard (Mean): 89.5
video-object-segmentation-on-davis-2016AOT (val)
F-Score: 92.1
Ju0026F: 91.1
Jaccard (Mean): 90.1
video-object-segmentation-on-davis-2016RMN (val)
F-Score: 88.7
Ju0026F: 88.8
Jaccard (Mean): 88.9
video-object-segmentation-on-davis-2016CFBI (val)
F-Score: 90.5
Ju0026F: 89.4
Jaccard (Mean): 88.3
video-object-segmentation-on-davis-2016STCN (val)
F-Score: 92.5
Ju0026F: 91.6
Jaccard (Mean): 90.8
video-object-segmentation-on-davis-2016STM (val)
F-Score: 89.9
Jaccard (Mean): 88.7
video-object-segmentation-on-davis-2016LCM (val)
F-Score: 91.4
Ju0026F: 90.7
Jaccard (Mean): 89.9
video-object-segmentation-on-davis-2016CFBI+ (val)
F-Score: 91.1
Ju0026F: 89.9
Jaccard (Mean): 88.7
video-object-segmentation-on-davis-2016TransVOS (val)
F-Score: 91.2
Ju0026F: 90.5
Jaccard (Mean): 89.8
video-object-segmentation-on-davis-2016BATMAN (val)
F-Score: 94.2
Ju0026F: 92.5
Jaccard (Mean): 90.7
video-object-segmentation-on-davis-2016RPCMVOS (val)
F-Score: 94
Ju0026F: 90.6
Jaccard (Mean): 87.1
video-object-segmentation-on-davis-2017-test-1RMN
F-measure: 78.1
Jaccard: 71.9
video-object-segmentation-on-davis-2017-test-1TransVOS
F-measure: 80.9
Jaccard: 73
Mean Jaccard u0026 F-Measure: 76.9
video-object-segmentation-on-davis-2017-test-1CFBI+
Jaccard: 71.6
Mean Jaccard u0026 F-Measure: 75.6
video-object-segmentation-on-davis-2017-test-1BATMAN
F-measure: 86.1
Jaccard: 78.4
Mean Jaccard u0026 F-Measure: 82.2
video-object-segmentation-on-davis-2017-test-1CFBI
F-measure: 78.7
Jaccard: 71.4
Mean Jaccard u0026 F-Measure: 75
video-object-segmentation-on-davis-2017-test-1LCM
F-measure: 81.8
Jaccard: 74.4
Mean Jaccard u0026 F-Measure: 78.1
video-object-segmentation-on-davis-2017-test-1STCN
F-measure: 79.6
Jaccard: 72.7
Mean Jaccard u0026 F-Measure: 76.1
video-object-segmentation-on-davis-2017-test-1KMN
F-measure: 80.3
Jaccard: 74.1
Mean Jaccard u0026 F-Measure: 77.2
video-object-segmentation-on-davis-2017-valTransVOS
F-measure: 86.4
Jaccard: 81.4
Mean Jaccard u0026 F-Measure: 83.9
video-object-segmentation-on-davis-2017-valAOT
F-measure: 87.5
Jaccard: 82.3
Mean Jaccard u0026 F-Measure: 84.9
video-object-segmentation-on-davis-2017-valCFBI
F-measure: 84.5
Jaccard: 79.3
Mean Jaccard u0026 F-Measure: 81.9
video-object-segmentation-on-davis-2017-valRMN
F-measure: 86
Jaccard: 81
Mean Jaccard u0026 F-Measure: 83.5
video-object-segmentation-on-davis-2017-valSTM
F-measure: 84.3
Jaccard: 79.2
video-object-segmentation-on-davis-2017-valLWL
F-measure: 84.1
Jaccard: 79.1
Mean Jaccard u0026 F-Measure: 81.6
video-object-segmentation-on-davis-2017-valBATMAN
F-measure: 89.3
Mean Jaccard u0026 F-Measure: 86.2
video-object-segmentation-on-davis-2017-valSST
F-measure: 85.1
Jaccard: 79.9
Mean Jaccard u0026 F-Measure: 82.5
video-object-segmentation-on-davis-2017-valCFBI+
F-measure: 85.7
Jaccard: 80.1
Mean Jaccard u0026 F-Measure: 82.9
video-object-segmentation-on-davis-2017-valSTCN
F-measure: 88.6
Jaccard: 82.2
Mean Jaccard u0026 F-Measure: 85.4
video-object-segmentation-on-davis-2017-valLCM
F-measure: 86.5
Jaccard: 80.5
video-object-segmentation-on-davis-2017-valRPCMVOS
Jaccard: 81.3
Mean Jaccard u0026 F-Measure: 83.7
video-object-segmentation-on-davis-2017-valKMN
F-measure: 85.6
Jaccard: 80
Mean Jaccard u0026 F-Measure: 82.8
video-object-segmentation-on-davis-2017-valAFB-URR
F-measure: 76.1
Jaccard: 73
Mean Jaccard u0026 F-Measure: 74.6
video-object-segmentation-on-youtube-vos-1SST
Jaccard (Seen): 81.2
Jaccard (Unseen): 76
Mean Jaccard u0026 F-Measure: 81.7
video-object-segmentation-on-youtube-vos-1AFB-URR
F-Measure (Seen): 83.1
F-Measure (Unseen): 82.6
Jaccard (Seen): 78.8
Jaccard (Unseen): 74.1
Mean Jaccard u0026 F-Measure: 79.6
video-object-segmentation-on-youtube-vos-1KMN
F-Measure (Seen): 85.6
F-Measure (Unseen): 83.3
Jaccard (Seen): 81.4
Jaccard (Unseen): 75.3
Mean Jaccard u0026 F-Measure: 81.4
video-object-segmentation-on-youtube-vos-1TransVOS
F-Measure (Seen): 86.7
F-Measure (Unseen): 83.4
Jaccard (Seen): 82
Jaccard (Unseen): 75
Mean Jaccard u0026 F-Measure: 81.8
video-object-segmentation-on-youtube-vos-1LWL
F-Measure (Seen): 84.9
F-Measure (Unseen): 84.4
Jaccard (Seen): 80.4
Jaccard (Unseen): 76.4
Mean Jaccard u0026 F-Measure: 81.5
video-object-segmentation-on-youtube-vos-1RPCMVOS
F-Measure (Seen): 87.7
F-Measure (Unseen): 86.7
Jaccard (Seen): 83.1
Jaccard (Unseen): 78.5
Mean Jaccard u0026 F-Measure: 84
video-object-segmentation-on-youtube-vos-1AOT
F-Measure (Seen): 88.5
F-Measure (Unseen): 86.1
Jaccard (Seen): 83.7
Jaccard (Unseen): 78.1
Mean Jaccard u0026 F-Measure: 84.1
video-object-segmentation-on-youtube-vos-1STCN
F-Measure (Seen): 86.5
F-Measure (Unseen): 85.7
Jaccard (Seen): 81.9
Jaccard (Unseen): 77.9
Mean Jaccard u0026 F-Measure: 83
video-object-segmentation-on-youtube-vos-1RMN
F-Measure (Seen): 85.7
F-Measure (Unseen): 82.4
Jaccard (Seen): 82.1
Jaccard (Unseen): 75.7
video-object-segmentation-on-youtube-vos-1CFBI
F-Measure (Seen): 85.8
Jaccard (Seen): 81.1
video-object-segmentation-on-youtube-vos-1CFBI+
F-Measure (Seen): 86.6
F-Measure (Unseen): 85.6
Jaccard (Seen): 81.8
Jaccard (Unseen): 77.1
Mean Jaccard u0026 F-Measure: 82.8
video-object-segmentation-on-youtube-vos-1STM
F-Measure (Seen): 84.2
F-Measure (Unseen): 80.9
Jaccard (Seen): 79.7
Jaccard (Unseen): 72.8
Mean Jaccard u0026 F-Measure: 79.4
video-object-segmentation-on-youtube-vos-1LCM
Jaccard (Seen): 82.2
Mean Jaccard u0026 F-Measure: 82
video-object-segmentation-on-youtube-vos-2019-2BATMAN
F-Measure (Seen): 89.3
F-Measure (Unseen): 87.2
Jaccard (Seen): 84.5
Jaccard (Unseen): 79
Mean Jaccard u0026 F-Measure: 85
video-object-segmentation-on-youtube-vos-2019-2CFBI
F-Measure (Seen): 85.1
F-Measure (Unseen): 83
Jaccard (Seen): 80.6
Jaccard (Unseen): 75.2
Mean Jaccard u0026 F-Measure: 81
visual-object-tracking-on-youtube-vosTransVOS
F-Measure (Seen): 86.7
F-Measure (Unseen): 83.4
visual-object-tracking-on-youtube-vosKMN
Jaccard (Unseen): 75.3
visual-object-tracking-on-youtube-vosRMN
Jaccard (Unseen): 75.7
visual-object-tracking-on-youtube-vosCFBI
F-Measure (Unseen): 83.4

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
BATMAN:运动-外观邻域空间中的双边注意力变换器用于视频目标分割 | 论文 | HyperAI超神经