
摘要
时空记忆(STM)网络方法由于其卓越的性能,在半监督视频对象分割(SVOS)领域中占据主导地位。在本研究中,我们确定了可以在三个方面改进这些方法的关键点:i) 监督信号,ii) 预训练,iii) 空间感知。为此,我们提出了 TrickVOS;这是一种通用的方法无关技巧包,分别针对上述每个方面提出了解决方案:i) 结构感知混合损失函数,ii) 简单的解码器预训练方案,iii) 低成本的空间约束跟踪器。最后,我们设计了一种轻量级网络,并展示了当使用 TrickVOS 训练时,该网络在 DAVIS 和 YouTube 基准测试中的表现可与最先进的方法相媲美,同时成为首批能够在移动设备上实现实时运行的基于 STM 的 SVOS 方法之一。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| semi-supervised-video-object-segmentation-on-18 | Lightweight TrickVOS (PT) | F-Measure (Seen): 83.3 F-Measure (Unseen): 84 J score (unseen): 75.2 Ju0026F: 80.5 Jaccard (Seen): 79.5 |
| semi-supervised-video-object-segmentation-on-18 | STCN + TrickVOS (PT) | F-Measure (Seen): 86.4 F-Measure (Unseen): 85.5 Ju0026F: 82.8 Jaccard (Seen): 82.1 Jaccard (Unseen): 77.2 |
| semi-supervised-video-object-segmentation-on-2 | Lightweight TrickVOS (PT) | F-measure (Mean): 86 Ju0026F: 82.7 Jaccard (Mean): 79.4 Speed (FPS): 76.4 |
| semi-supervised-video-object-segmentation-on-2 | STCN + TrickVOS (PT) | F-measure (Mean): 89.6 Ju0026F: 86.1 Jaccard (Mean): 82.6 Speed (FPS): 35.1 |
| semi-supervised-video-object-segmentation-on-3 | STCN + TrickVOS (PT) | Speed (FPS): 45.4 |
| visual-object-tracking-on-davis-2016 | STCN + TrickVOS (PT) | F-measure (Mean): 93.1 Ju0026F: 91.8 Jaccard (Mean): 90.5 |
| visual-object-tracking-on-davis-2016 | Lightweight TrickVOS (PT) | F-measure (Mean): 89.9 Ju0026F: 89.3 Jaccard (Mean): 88.7 Speed (FPS): 86.4 |