4 个月前

用于通用图像分割的掩码注意力Mask Transformer

用于通用图像分割的掩码注意力Mask Transformer

摘要

图像分割是指将具有不同语义(例如类别或实例归属)的像素进行分组,其中每种语义选择定义了一个任务。尽管各个任务之间的语义差异较大,当前的研究重点仍然在于为每个任务设计专门的架构。本文介绍了一种新的架构——掩码注意力掩码变换器(Mask2Former),该架构能够应对任何图像分割任务(全景分割、实例分割或语义分割)。其关键组件包括掩码注意力机制,通过在预测的掩码区域内限制交叉注意力来提取局部特征。除了至少将研究工作量减少三倍外,Mask2Former在四个流行数据集上的表现显著优于最佳的专用架构。尤为值得一提的是,Mask2Former在全景分割(COCO数据集上的PQ得分为57.8)、实例分割(COCO数据集上的AP得分为50.1)和语义分割(ADE20K数据集上的mIoU得分为57.7)方面均创下了新的最先进水平。

代码仓库

基准测试

基准方法指标
instance-segmentation-on-ade20k-valMask2Former (Swin-L, single-scale)
AP: 34.9
APL: 54.7
APM: 40
APS: 16.3
instance-segmentation-on-ade20k-valMask2Former (ResNet-50)
APL: 43.1
APM: 28.9
instance-segmentation-on-ade20k-valMask2Former (ResNet50)
AP: 26.4
APS: 10.4
instance-segmentation-on-ade20k-valMask2Former (Swin-L + FAPN)
AP: 33.4
APL: 54.6
APM: 37.6
APS: 14.6
instance-segmentation-on-cityscapes-valMask2Former (Swin-L, single-scale)
mask AP: 43.7
instance-segmentation-on-cityscapes-valMask2Former (Swin-S)
mask AP: 41.8
instance-segmentation-on-cityscapes-valMask2Former (ResNet-101)
mask AP: 38.5
instance-segmentation-on-cityscapes-valMask2Former (Swin-B)
mask AP: 42
instance-segmentation-on-cityscapes-valMask2Former (Swin-T)
mask AP: 39.7
instance-segmentation-on-cityscapes-valMask2Former (ResNet-50)
mask AP: 37.4
instance-segmentation-on-cocoMask2Former (Swin-L, single scale)
AP50: 74.9
AP75: 54.9
APL: 71.2
APM: 53.8
APS: 29.1
mask AP: 50.5
instance-segmentation-on-coco-minivalMask2Former (Swin-L)
mask AP: 50.1
instance-segmentation-on-coco-val-panopticMask2Former (Swin-L, single-scale)
AP: 49.1
panoptic-segmentation-on-ade20k-valMask2Former (Swin-L)
AP: 34.2
PQ: 48.1
mIoU: 54.5
panoptic-segmentation-on-ade20k-valMask2Former (ResNet-50, 640x640)
AP: 26.5
mIoU: 46.1
panoptic-segmentation-on-ade20k-valMask2Former (ResNet-50, 640x640)
PQ: 39.7
panoptic-segmentation-on-ade20k-valMask2Former (Swin-L + FAPN, 640x640)
AP: 33.2
PQ: 46.2
mIoU: 55.4
panoptic-segmentation-on-ade20k-valPanoptic-DeepLab (SwideRNet)
PQ: 37.9
mIoU: 50
panoptic-segmentation-on-cityscapes-valMask2Former (Swin-L)
AP: 43.6
PQ: 66.6
mIoU: 82.9
panoptic-segmentation-on-coco-minivalMask2Former (single-scale)
AP: 48.6
PQ: 57.8
PQst: 48.1
PQth: 64.2
panoptic-segmentation-on-coco-test-devMask2Former (Swin-L)
PQ: 58.3
PQst: 48.1
PQth: 65.1
semantic-segmentation-on-ade20kMask2Former (SwinL-FaPN)
Validation mIoU: 57.7
semantic-segmentation-on-ade20kMask2Former (Swin-L-FaPN)
Validation mIoU: 56.4
semantic-segmentation-on-ade20kMask2Former (SwinL)
Validation mIoU: 57.3
semantic-segmentation-on-ade20kMask2Former(Swin-B)
Validation mIoU: 55.1
semantic-segmentation-on-ade20k-valMask2Former (Swin-L-FaPN, multiscale)
mIoU: 57.7
semantic-segmentation-on-ade20k-valMask2Former (Swin-L-FaPN)
mIoU: 56.4
semantic-segmentation-on-cityscapes-valMask2Former (Swin-L)
mIoU: 84.3
semantic-segmentation-on-coco-1MaskFormer (Swin-L, single-scale)
mIoU: 64.8
semantic-segmentation-on-coco-1Mask2Former (Swin-L, single-scale)
mIoU: 67.4
semantic-segmentation-on-mapillary-valMask2Former (Swin-L, multiscale)
mIoU: 64.7

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供
用于通用图像分割的掩码注意力Mask Transformer | 论文 | HyperAI超神经