4 个月前

将对象重新引入视频对象分割

将对象重新引入视频对象分割

摘要

我们介绍了Cutie,一种具有对象级记忆读取功能的视频对象分割(VOS)网络,该网络将存储在内存中的对象表示重新融入视频对象分割结果中。近期关于VOS的研究采用了自底向上的像素级记忆读取方法,这种方法由于匹配噪声的影响,尤其是在存在干扰物的情况下,导致在更具挑战性的数据集上性能较低。相比之下,Cutie通过适应一组小的对象查询来执行自顶向下的对象级记忆读取。通过这些查询,它利用基于查询的对象变换器(query-based object transformer, qt,因此称为Cutie)与自底向上的像素特征进行迭代交互。对象查询充当目标对象的高层次摘要,而高分辨率特征图则保留用于精确分割。结合前景背景掩码注意力机制,Cutie能够清晰地分离前景对象与背景的语义。在具有挑战性的MOSE数据集上,Cutie在运行时间相似的情况下比XMem提高了8.7 J&F指标,并且在速度快三倍的情况下比DeAOT提高了4.2 J&F指标。代码可在以下链接获取:https://hkchengrex.github.io/Cutie

代码仓库

hkchengrex/Cutie
官方
pytorch
GitHub 中提及

基准测试

基准方法指标
semi-supervised-video-object-segmentation-on-1Cutie (base, MEGA)
F-measure (Mean): 89.9
FPS: 36.4
Ju0026F: 86.1
Jaccard (Mean): 82.4
semi-supervised-video-object-segmentation-on-1Cutie+ (base)
F-measure (Mean): 89.2
FPS: 17.9
Ju0026F: 85.9
Jaccard (Mean): 82.6
semi-supervised-video-object-segmentation-on-1Cutie+ (base, MEGA)
F-measure (Mean): 91.4
FPS: 17.9
Ju0026F: 88.1
Jaccard (Mean): 84.7
semi-supervised-video-object-segmentation-on-18Cutie+ (base, MEGA)
F-Measure (Seen): 90.6
F-Measure (Unseen): 90.5
Ju0026F: 17.9
Jaccard (Seen): 86.3
Jaccard (Unseen): 82.7
Overall: 87.5
semi-supervised-video-object-segmentation-on-21Cutie (small, MEGA)
F: 72.9
FPS: 45.5
J: 64.3
Ju0026F: 68.6
semi-supervised-video-object-segmentation-on-21Cutie+ (base, MEGA)
F: 75.8
FPS: 17.9
J: 67.6
Ju0026F: 71.7
semi-supervised-video-object-segmentation-on-21Cutie (base)
F: 67.9
FPS: 36.4
J: 60.0
Ju0026F: 64.0
semi-supervised-video-object-segmentation-on-21Cutie+ (small, MEGA)
F: 74.5
FPS: 20.6
J: 66.0
Ju0026F: 70.3
semi-supervised-video-object-segmentation-on-21Cutie (small)
F: 66.2
FPS: 45.5
J: 58.2
Ju0026F: 62.2
semi-supervised-video-object-segmentation-on-21Cutie (base, with mose)
F: 72.3
FPS: 36.4
J: 64.2
Ju0026F: 68.3
semi-supervised-video-object-segmentation-on-21Cutie (base, MEGA)
F: 74.1
FPS: 36.4
J: 65.8
Ju0026F: 69.9
semi-supervised-video-object-segmentation-on-21Cutie (small, with mose)
F: 71.7
FPS: 45.5
J: 63.1
Ju0026F: 67.4
semi-supervised-video-object-segmentation-on-22Cutie (base, with mose, 600 pixels)
HOTA (all): 58.4
HOTA (common): 61.8
HOTA (uncommon): 57.5
semi-supervised-video-object-segmentation-on-22Cutie (base, MEGA, 600 pixels)
HOTA (all): 61.2
HOTA (common): 65.0
HOTA (uncommon): 60.3
semi-supervised-video-object-segmentation-on-23Cutie (base, MEGA, 600 pixels)
HOTA (all): 66.0
HOTA (common): 66.5
HOTA (uncommon): 65.9
semi-supervised-video-object-segmentation-on-23Cutie (base, with mose, 600 pixels)
HOTA (all): 62.6
HOTA (common): 63.8
HOTA (uncommon): 62.3
video-object-segmentation-on-moseCutie
Ju0026F: 68.3
video-object-segmentation-on-youtube-vosCutie+ (base, MEGA)
F-Measure (Seen): 91.0
F-Measure (Unseen): 90.1
Jaccard (Seen): 86.6
Jaccard (Unseen): 82.2
Overall: 87.5
Speed (FPS): 17.9
visual-object-tracking-on-davis-2017Cutie+ (base, MEGA)
F-measure (Mean): 90.8
Ju0026F: 88.1
Jaccard (Mean): 85.5
Speed (FPS): 17.9
visual-object-tracking-on-davis-2017Cutie (base)
F-measure (Mean): 91.1
Ju0026F: 87.9
Jaccard (Mean): 84.6
Params(M): 36.4
visual-object-tracking-on-davis-2017Cutie+ (base)
F-measure (Mean): 93.4
Ju0026F: 90.5
Jaccard (Mean): 87.5
Params(M): 17.9
visual-object-tracking-on-didiCutie
Tracking quality: 0.575

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
将对象重新引入视频对象分割 | 论文 | HyperAI超神经