
摘要
指代图像分割(RIS)是一项先进的视觉-语言任务,涉及根据自由形式的文本描述在图像中识别并分割对象。尽管先前的研究主要集中在对齐视觉和语言特征上,但对于数据增强等训练技术的探索仍相对不足。在这项工作中,我们探讨了适用于RIS的有效数据增强方法,并提出了一种新的训练框架,称为掩码指代图像分割(MaskRIS)。我们观察到,传统的图像增强方法在RIS中表现不佳,导致性能下降,而简单的随机掩码则显著提升了RIS的性能。MaskRIS结合了图像和文本掩码,并通过畸变感知上下文学习(DCL)充分利用掩码策略的优势。这种方法可以提高模型对遮挡、不完整信息以及各种语言复杂性的鲁棒性,从而实现显著的性能提升。实验结果表明,MaskRIS可以轻松应用于多种RIS模型,在完全监督和弱监督设置下均优于现有方法。最终,MaskRIS在RefCOCO、RefCOCO+和RefCOCOg数据集上取得了最新的最佳性能。代码可在https://github.com/naver-ai/maskris 获取。
代码仓库
naver-ai/maskris
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| referring-expression-segmentation-on-refcoco | MaskRIS (Swin-B, combined DB) | Overall IoU: 78.71 |
| referring-expression-segmentation-on-refcoco | MaskRIS (Swin-B) | Mean IoU: 78.35 Overall IoU: 76.49 |
| referring-expression-segmentation-on-refcoco-3 | MaskRIS (Swin-B, combined DB) | Overall IoU: 70.26 |
| referring-expression-segmentation-on-refcoco-3 | MaskRIS (Swin-B) | Mean IoU: 71.68 Overall IoU: 67.54 |
| referring-expression-segmentation-on-refcoco-4 | MaskRIS (Swin-B) | Mean IoU: 76.73 Overall IoU: 74.46 |
| referring-expression-segmentation-on-refcoco-4 | MaskRIS (Swin-B, combined DB) | Overall IoU: 75.15 |
| referring-expression-segmentation-on-refcoco-5 | MaskRIS (Swin-B) | Mean IoU: 64.5 Overall IoU: 59.39 |
| referring-expression-segmentation-on-refcoco-5 | MaskRIS (Swin-B, combined DB) | Overall IoU: 62.83 |
| referring-expression-segmentation-on-refcoco-8 | MaskRIS (Swin-B) | Mean IoU: 80.24 Overall IoU: 78.96 |
| referring-expression-segmentation-on-refcoco-8 | MaskRIS (Swin-B, combined DB) | Overall IoU: 80.64 |
| referring-expression-segmentation-on-refcoco-9 | MaskRIS (Swin-B) | Mean IoU: 76.06 Overall IoU: 73.96 |
| referring-expression-segmentation-on-refcoco-9 | MaskRIS (Swin-B, combined DB) | Overall IoU: 75.1 |
| referring-expression-segmentation-on-refcocog | MaskRIS (Swin-B) | Mean IoU: 69.31 Overall IoU: 65.55 |
| referring-expression-segmentation-on-refcocog | MaskRIS (Swin-B, combined DB) | Overall IoU: 69.12 |
| referring-expression-segmentation-on-refcocog-1 | MaskRIS (Swin-B) | Mean IoU: 69.42 Overall IoU: 66.5 |
| referring-expression-segmentation-on-refcocog-1 | MaskRIS (Swin-B, combined DB) | Overall IoU: 71.09 |