
摘要
本文解决了在资源受限设备(如手机)上进行半监督视频对象分割的问题。我们将这一问题表述为一种知识蒸馏任务,通过实验证明,具有有限内存的小型时空网络可以以较低的计算成本(三星Galaxy S22每帧32毫秒)实现与最先进方法相媲美的结果。具体而言,我们提供了一个理论基础框架,将知识蒸馏与有监督对比表示学习统一起来。这些模型能够同时从像素级对比学习和预训练教师模型的知识蒸馏中获益。我们在标准DAVIS和YouTube基准测试中验证了该损失函数的有效性,尽管运行速度提高了多达5倍,并且参数量减少了32倍,但仍能取得与最先进方法相当的J&F分数。
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| video-object-segmentation-on-davis-2016 | MobileVOS (val) | F-Score: 92.6 Ju0026F: 91.4 Jaccard (Mean): 90.3 |
| video-object-segmentation-on-youtube-vos-2019-2 | MobileVOS | F-Measure (Seen): 87.7 F-Measure (Unseen): 85.3 Jaccard (Seen): 83.2 Jaccard (Unseen): 76.9 Mean Jaccard u0026 F-Measure: 83.3 |
| visual-object-tracking-on-davis-2016 | MobileVOS | F-measure (Mean): 91.6 Ju0026F: 90.6 Jaccard (Mean): 89.7 Speed (FPS): 100.1 |
| visual-object-tracking-on-davis-2016 | MobileVOS (BL30K) | F-measure (Mean): 92.6 Ju0026F: 91.4 Jaccard (Mean): 90.3 Speed (FPS): 100.1 |
| visual-object-tracking-on-davis-2017 | MobileVOS (BL30K) | F-measure (Mean): 88.9 Ju0026F: 82.3 Params(M): 8.1 Speed (FPS): 90.6 |
| visual-object-tracking-on-davis-2017 | MobileVOS | F-measure (Mean): 87.1 Ju0026F: 80.2 Params(M): 8.1 Speed (FPS): 90.6 |