
摘要
少样本语义分割(Few-shot Semantic Segmentation)旨在仅凭少量标注样本,对查询图像中的目标进行精确分割。然而,以往许多先进的方法要么不得不忽略复杂的局部语义特征,要么面临较高的计算复杂度问题。为应对上述挑战,本文提出一种基于Transformer架构的新颖少样本语义分割框架。该方法引入空间变换解码器(spatial transformer decoder)与上下文掩码生成模块(contextual mask generation module),以增强支持图像(support images)与查询图像(query images)之间的关系建模能力。此外,我们设计了一种多尺度解码器(multi-scale decoder),通过分层方式融合不同分辨率的特征,实现对分割掩码的精细化优化。同时,本方法在编码器中间阶段引入全局特征,以增强上下文理解能力,同时保持轻量化的网络结构,有效降低计算开销。该方法在性能与效率之间取得了良好平衡,在PASCAL-5^i与COCO-20^i等基准数据集上,无论是1-shot还是5-shot设置下均取得了具有竞争力的实验结果。尤为突出的是,本模型仅含150万参数,即展现出优异的性能,同时克服了现有方法的诸多局限性。项目代码已开源:https://github.com/amirrezafateh/MSDNet
代码仓库
amirrezafateh/msdnet
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| few-shot-semantic-segmentation-on-coco-20i | MSDNet (ResNet-101) | Mean IoU: 73.9 |
| few-shot-semantic-segmentation-on-coco-20i | MSDNet (ResNet-50) | Mean IoU: 72.1 |
| few-shot-semantic-segmentation-on-coco-20i-1 | MSDNet (ResNet-101) | FB-IoU: 71.3 Mean IoU: 48.5 learnable parameters (million): 1.5 |
| few-shot-semantic-segmentation-on-coco-20i-1 | MSDNet (ResNet-50) | FB-IoU: 70.4 Mean IoU: 46.5 learnable parameters (million): 1.5 |
| few-shot-semantic-segmentation-on-coco-20i-2 | MSDNet (ResNet-101) | Mean IoU: 76.4 |
| few-shot-semantic-segmentation-on-coco-20i-2 | MSDNet (ResNet-50) | Mean IoU: 74.2 |
| few-shot-semantic-segmentation-on-coco-20i-5 | MSDNet (ResNet-50) | FB-IoU: 74.5 Mean IoU: 54.5 learnable parameters (million): 1.5 |
| few-shot-semantic-segmentation-on-coco-20i-5 | MSDNet (ResNet-101) | FB-IoU: 75.1 Mean IoU: 55.3 learnable parameters (million): 1.5 |
| few-shot-semantic-segmentation-on-pascal-5i-1 | MSDNet (ResNet-101) | FB-IoU: 77.3 Mean IoU: 64.7 learnable parameters (million): 1.5 |
| few-shot-semantic-segmentation-on-pascal-5i-1 | MSDNet (ResNet-50) | FB-IoU: 77.1 Mean IoU: 64.3 learnable parameters (million): 1.5 |
| few-shot-semantic-segmentation-on-pascal-5i-5 | MSDNet (ResNet-50) | FB-IoU: 82.1 Mean IoU: 68.7 learnable parameters (million): 1.5 |
| few-shot-semantic-segmentation-on-pascal-5i-5 | MSDNet (ResNet-101) | FB-IoU: 85.0 Mean IoU: 70.8 learnable parameters (million): 1.5 |