
摘要
少样本语义分割(Few-shot Semantic Segmentation, FSS)研究近年来受到广泛关注,其目标是在仅提供少量目标类别标注的支持图像(support images)的情况下,对查询图像(query image)中的目标对象进行精准分割。该任务的关键在于充分挖掘支持图像中的信息,通过捕捉查询图像与支持图像之间细粒度的关联关系实现有效分割。然而,现有大多数方法通常将支持图像信息压缩为少数类别级原型(class-wise prototypes),或仅在像素级别利用部分支持信息(例如仅前景区域),导致不可避免的信息损失。本文提出了一种新型方法——密集像素级跨查询与支持注意力加权掩码聚合(Dense pixel-wise Cross-query-and-support Attention weighted Mask Aggregation, DCAMA)。该方法通过多层级像素级的配对查询与支持特征相关性,充分挖掘支持图像中前景与背景的全部信息。在Transformer架构中,DCAMA采用缩放点积注意力机制,将每个查询像素视为一个“token”,计算其与所有支持像素之间的相似度,并将查询像素的分割标签预测为所有支持像素标签的加权聚合结果——权重即为对应相似度。这一独特建模方式使得DCAMA具备良好的表达能力。基于DCAMA的这一特性,我们进一步设计了一种高效且有效的单次前向推理机制,用于实现n-shot分割。该机制一次性整合所有支持图像的像素信息,完成掩码聚合,显著提升了推理效率。在PASCAL-5i、COCO-20i和FSS-1000等标准FSS基准上的实验结果表明,DCAMA显著超越了现有最优方法,分别在1-shot mIoU指标上取得了3.1%、9.7%和3.6%的绝对性能提升。消融实验进一步验证了DCAMA各设计组件的有效性与合理性。
代码仓库
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| few-shot-semantic-segmentation-on-coco-20i-1 | DCAMA (Swin-B) | FB-IoU: 73.2 Mean IoU: 50.9 |
| few-shot-semantic-segmentation-on-coco-20i-1 | DCAMA (ResNet-101) | FB-IoU: 69.9 Mean IoU: 43.5 learnable parameters (million): 47.7 |
| few-shot-semantic-segmentation-on-coco-20i-1 | DCAMA (ResNet-50) | FB-IoU: 69.5 Mean IoU: 43.3 |
| few-shot-semantic-segmentation-on-coco-20i-2-1 | DCAMA (Swin-B) | mIoU: 31.7 |
| few-shot-semantic-segmentation-on-coco-20i-5 | DCAMA (ResNet-50) | FB-IoU: 71.7 Mean IoU: 48.3 learnable parameters (million): 47.7 |
| few-shot-semantic-segmentation-on-coco-20i-5 | DCAMA (Swin-B) | FB-IoU: 76.9 Mean IoU: 58.3 |
| few-shot-semantic-segmentation-on-coco-20i-5 | DCAMA (ResNet-101) | FB-IoU: 73.3 Mean IoU: 51.9 learnable parameters (million): 47.7 |
| few-shot-semantic-segmentation-on-fss-1000-1 | DCAMA (ResNet-101) | FB-IoU: 92.4 Mean IoU: 88.3 |
| few-shot-semantic-segmentation-on-fss-1000-1 | DCAMA (Swin-B) | FB-IoU: 93.8 Mean IoU: 90.1 |
| few-shot-semantic-segmentation-on-fss-1000-1 | DCAMA (ResNet-50) | FB-IoU: 92.5 Mean IoU: 88.2 |
| few-shot-semantic-segmentation-on-fss-1000-5 | DCAMA (Swin-B) | FB-IoU: 94.1 Mean IoU: 90.4 |
| few-shot-semantic-segmentation-on-fss-1000-5 | DCAMA (ResNet-50) | FB-IoU: 92.9 Mean IoU: 88.8 |
| few-shot-semantic-segmentation-on-fss-1000-5 | DCAMA (ResNet-101) | FB-IoU: 93.1 Mean IoU: 89.1 |
| few-shot-semantic-segmentation-on-pascal-5i-1 | DCAMA (Swin-B) | FB-IoU: 78.5 Mean IoU: 69.3 |
| few-shot-semantic-segmentation-on-pascal-5i-1 | DCAMA (ResNet-50) | FB-IoU: 75.7 Mean IoU: 64.6 |
| few-shot-semantic-segmentation-on-pascal-5i-1 | DCAMA (ResNet-101) | FB-IoU: 77.6 |
| few-shot-semantic-segmentation-on-pascal-5i-5 | DCAMA (ResNet-50) | FB-IoU: 79.5 Mean IoU: 68.5 |
| few-shot-semantic-segmentation-on-pascal-5i-5 | DCAMA (Swin-B) | FB-IoU: 82.9 Mean IoU: 74.9 |
| few-shot-semantic-segmentation-on-pascal-5i-5 | DCAMA (ResNet-101) | FB-IoU: 80.8 Mean IoU: 68.3 |