
摘要
视频对象分割(VOS)是视频理解的基础。基于Transformer的方法在半监督VOS中表现出显著的性能提升。然而,现有方法在分割视觉上相似且相互靠近的对象时仍面临挑战。本文提出了一种新颖的运动-外观邻域空间双边注意Transformer(BATMAN),用于半监督VOS。该方法通过一个创新的光流校准模块捕捉视频中的物体运动,该模块将分割掩码与光流估计融合,以提高物体内部的光流平滑度并减少物体边界处的噪声。经过校准的光流随后被应用于我们提出的新型双边注意机制中,该机制在考虑运动和外观的情况下计算查询帧和参考帧之间的对应关系。大量实验验证了BATMAN架构的有效性,其在四个流行的VOS基准测试中均超越了所有现有的最先进方法:Youtube-VOS 2019(85.0%)、Youtube-VOS 2018(85.3%)、DAVIS 2017 Val/Testdev(86.2%/82.2%)和DAVIS 2016(92.5%)。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| video-object-segmentation-on-davis-2016 | KMN (val) | F-Score: 91.5 Ju0026F: 90.5 Jaccard (Mean): 89.5 |
| video-object-segmentation-on-davis-2016 | AOT (val) | F-Score: 92.1 Ju0026F: 91.1 Jaccard (Mean): 90.1 |
| video-object-segmentation-on-davis-2016 | RMN (val) | F-Score: 88.7 Ju0026F: 88.8 Jaccard (Mean): 88.9 |
| video-object-segmentation-on-davis-2016 | CFBI (val) | F-Score: 90.5 Ju0026F: 89.4 Jaccard (Mean): 88.3 |
| video-object-segmentation-on-davis-2016 | STCN (val) | F-Score: 92.5 Ju0026F: 91.6 Jaccard (Mean): 90.8 |
| video-object-segmentation-on-davis-2016 | STM (val) | F-Score: 89.9 Jaccard (Mean): 88.7 |
| video-object-segmentation-on-davis-2016 | LCM (val) | F-Score: 91.4 Ju0026F: 90.7 Jaccard (Mean): 89.9 |
| video-object-segmentation-on-davis-2016 | CFBI+ (val) | F-Score: 91.1 Ju0026F: 89.9 Jaccard (Mean): 88.7 |
| video-object-segmentation-on-davis-2016 | TransVOS (val) | F-Score: 91.2 Ju0026F: 90.5 Jaccard (Mean): 89.8 |
| video-object-segmentation-on-davis-2016 | BATMAN (val) | F-Score: 94.2 Ju0026F: 92.5 Jaccard (Mean): 90.7 |
| video-object-segmentation-on-davis-2016 | RPCMVOS (val) | F-Score: 94 Ju0026F: 90.6 Jaccard (Mean): 87.1 |
| video-object-segmentation-on-davis-2017-test-1 | RMN | F-measure: 78.1 Jaccard: 71.9 |
| video-object-segmentation-on-davis-2017-test-1 | TransVOS | F-measure: 80.9 Jaccard: 73 Mean Jaccard u0026 F-Measure: 76.9 |
| video-object-segmentation-on-davis-2017-test-1 | CFBI+ | Jaccard: 71.6 Mean Jaccard u0026 F-Measure: 75.6 |
| video-object-segmentation-on-davis-2017-test-1 | BATMAN | F-measure: 86.1 Jaccard: 78.4 Mean Jaccard u0026 F-Measure: 82.2 |
| video-object-segmentation-on-davis-2017-test-1 | CFBI | F-measure: 78.7 Jaccard: 71.4 Mean Jaccard u0026 F-Measure: 75 |
| video-object-segmentation-on-davis-2017-test-1 | LCM | F-measure: 81.8 Jaccard: 74.4 Mean Jaccard u0026 F-Measure: 78.1 |
| video-object-segmentation-on-davis-2017-test-1 | STCN | F-measure: 79.6 Jaccard: 72.7 Mean Jaccard u0026 F-Measure: 76.1 |
| video-object-segmentation-on-davis-2017-test-1 | KMN | F-measure: 80.3 Jaccard: 74.1 Mean Jaccard u0026 F-Measure: 77.2 |
| video-object-segmentation-on-davis-2017-val | TransVOS | F-measure: 86.4 Jaccard: 81.4 Mean Jaccard u0026 F-Measure: 83.9 |
| video-object-segmentation-on-davis-2017-val | AOT | F-measure: 87.5 Jaccard: 82.3 Mean Jaccard u0026 F-Measure: 84.9 |
| video-object-segmentation-on-davis-2017-val | CFBI | F-measure: 84.5 Jaccard: 79.3 Mean Jaccard u0026 F-Measure: 81.9 |
| video-object-segmentation-on-davis-2017-val | RMN | F-measure: 86 Jaccard: 81 Mean Jaccard u0026 F-Measure: 83.5 |
| video-object-segmentation-on-davis-2017-val | STM | F-measure: 84.3 Jaccard: 79.2 |
| video-object-segmentation-on-davis-2017-val | LWL | F-measure: 84.1 Jaccard: 79.1 Mean Jaccard u0026 F-Measure: 81.6 |
| video-object-segmentation-on-davis-2017-val | BATMAN | F-measure: 89.3 Mean Jaccard u0026 F-Measure: 86.2 |
| video-object-segmentation-on-davis-2017-val | SST | F-measure: 85.1 Jaccard: 79.9 Mean Jaccard u0026 F-Measure: 82.5 |
| video-object-segmentation-on-davis-2017-val | CFBI+ | F-measure: 85.7 Jaccard: 80.1 Mean Jaccard u0026 F-Measure: 82.9 |
| video-object-segmentation-on-davis-2017-val | STCN | F-measure: 88.6 Jaccard: 82.2 Mean Jaccard u0026 F-Measure: 85.4 |
| video-object-segmentation-on-davis-2017-val | LCM | F-measure: 86.5 Jaccard: 80.5 |
| video-object-segmentation-on-davis-2017-val | RPCMVOS | Jaccard: 81.3 Mean Jaccard u0026 F-Measure: 83.7 |
| video-object-segmentation-on-davis-2017-val | KMN | F-measure: 85.6 Jaccard: 80 Mean Jaccard u0026 F-Measure: 82.8 |
| video-object-segmentation-on-davis-2017-val | AFB-URR | F-measure: 76.1 Jaccard: 73 Mean Jaccard u0026 F-Measure: 74.6 |
| video-object-segmentation-on-youtube-vos-1 | SST | Jaccard (Seen): 81.2 Jaccard (Unseen): 76 Mean Jaccard u0026 F-Measure: 81.7 |
| video-object-segmentation-on-youtube-vos-1 | AFB-URR | F-Measure (Seen): 83.1 F-Measure (Unseen): 82.6 Jaccard (Seen): 78.8 Jaccard (Unseen): 74.1 Mean Jaccard u0026 F-Measure: 79.6 |
| video-object-segmentation-on-youtube-vos-1 | KMN | F-Measure (Seen): 85.6 F-Measure (Unseen): 83.3 Jaccard (Seen): 81.4 Jaccard (Unseen): 75.3 Mean Jaccard u0026 F-Measure: 81.4 |
| video-object-segmentation-on-youtube-vos-1 | TransVOS | F-Measure (Seen): 86.7 F-Measure (Unseen): 83.4 Jaccard (Seen): 82 Jaccard (Unseen): 75 Mean Jaccard u0026 F-Measure: 81.8 |
| video-object-segmentation-on-youtube-vos-1 | LWL | F-Measure (Seen): 84.9 F-Measure (Unseen): 84.4 Jaccard (Seen): 80.4 Jaccard (Unseen): 76.4 Mean Jaccard u0026 F-Measure: 81.5 |
| video-object-segmentation-on-youtube-vos-1 | RPCMVOS | F-Measure (Seen): 87.7 F-Measure (Unseen): 86.7 Jaccard (Seen): 83.1 Jaccard (Unseen): 78.5 Mean Jaccard u0026 F-Measure: 84 |
| video-object-segmentation-on-youtube-vos-1 | AOT | F-Measure (Seen): 88.5 F-Measure (Unseen): 86.1 Jaccard (Seen): 83.7 Jaccard (Unseen): 78.1 Mean Jaccard u0026 F-Measure: 84.1 |
| video-object-segmentation-on-youtube-vos-1 | STCN | F-Measure (Seen): 86.5 F-Measure (Unseen): 85.7 Jaccard (Seen): 81.9 Jaccard (Unseen): 77.9 Mean Jaccard u0026 F-Measure: 83 |
| video-object-segmentation-on-youtube-vos-1 | RMN | F-Measure (Seen): 85.7 F-Measure (Unseen): 82.4 Jaccard (Seen): 82.1 Jaccard (Unseen): 75.7 |
| video-object-segmentation-on-youtube-vos-1 | CFBI | F-Measure (Seen): 85.8 Jaccard (Seen): 81.1 |
| video-object-segmentation-on-youtube-vos-1 | CFBI+ | F-Measure (Seen): 86.6 F-Measure (Unseen): 85.6 Jaccard (Seen): 81.8 Jaccard (Unseen): 77.1 Mean Jaccard u0026 F-Measure: 82.8 |
| video-object-segmentation-on-youtube-vos-1 | STM | F-Measure (Seen): 84.2 F-Measure (Unseen): 80.9 Jaccard (Seen): 79.7 Jaccard (Unseen): 72.8 Mean Jaccard u0026 F-Measure: 79.4 |
| video-object-segmentation-on-youtube-vos-1 | LCM | Jaccard (Seen): 82.2 Mean Jaccard u0026 F-Measure: 82 |
| video-object-segmentation-on-youtube-vos-2019-2 | BATMAN | F-Measure (Seen): 89.3 F-Measure (Unseen): 87.2 Jaccard (Seen): 84.5 Jaccard (Unseen): 79 Mean Jaccard u0026 F-Measure: 85 |
| video-object-segmentation-on-youtube-vos-2019-2 | CFBI | F-Measure (Seen): 85.1 F-Measure (Unseen): 83 Jaccard (Seen): 80.6 Jaccard (Unseen): 75.2 Mean Jaccard u0026 F-Measure: 81 |
| visual-object-tracking-on-youtube-vos | TransVOS | F-Measure (Seen): 86.7 F-Measure (Unseen): 83.4 |
| visual-object-tracking-on-youtube-vos | KMN | Jaccard (Unseen): 75.3 |
| visual-object-tracking-on-youtube-vos | RMN | Jaccard (Unseen): 75.7 |
| visual-object-tracking-on-youtube-vos | CFBI | F-Measure (Unseen): 83.4 |