3 个月前

Swin Transformer:基于移位窗口的分层视觉Transformer

Swin Transformer:基于移位窗口的分层视觉Transformer

摘要

本文提出了一种新型视觉Transformer——Swin Transformer,能够作为计算机视觉领域的通用主干网络(backbone)。将Transformer从自然语言处理领域迁移到视觉任务面临诸多挑战,主要源于两个领域的本质差异:视觉实体的尺度变化极大,且图像像素的分辨率远高于文本中单词的表示粒度。为应对这些差异,我们设计了一种分层的Transformer架构,其特征表示通过移位窗口(Shifted Windows)机制进行计算。该移位窗口机制在保持自注意力计算局限于非重叠局部窗口以提升效率的同时,仍能实现跨窗口的信息交互。这种分层结构具备在多尺度上建模的灵活性,并且其计算复杂度与图像尺寸呈线性关系。上述特性使得Swin Transformer能够广泛适用于各类视觉任务,包括图像分类(在ImageNet-1K上达到87.3%的Top-1准确率)、密集预测任务如目标检测(在COCO test-dev上实现58.7 box AP和51.1 mask AP)以及语义分割(在ADE20K验证集上达到53.5 mIoU)。其性能显著超越此前的最先进方法,在COCO数据集上分别提升了+2.7 box AP和+2.6 mask AP,在ADE20K上提升了+3.2 mIoU,充分展示了基于Transformer的模型作为视觉主干网络的巨大潜力。此外,该分层设计与移位窗口策略对纯MLP架构也具有显著的提升作用。相关代码与预训练模型已公开发布于:https://github.com/microsoft/Swin-Transformer。

代码仓库

mzeromiko/vmamba
pytorch
GitHub 中提及
rami0205/ngramswin
pytorch
GitHub 中提及
open-mmlab/mmpose
pytorch
GitHub 中提及
liuxingwt/CLS
pytorch
GitHub 中提及
USTC-IMCL/HST-for-Compressed-SR
pytorch
GitHub 中提及
microsoft/Swin-Transformer
官方
pytorch
GitHub 中提及
rwightman/pytorch-image-models
pytorch
GitHub 中提及
SwinTransformer/Transformer-SSL
pytorch
GitHub 中提及
shellredia/snake-swin-octa
pytorch
GitHub 中提及
ayanglab/swinmr
pytorch
GitHub 中提及
berniwal/swin-transformer-pytorch
pytorch
GitHub 中提及
BR-IDL/PaddleViT
paddle
GitHub 中提及
Gojay001/toolkit-DeepLearning
pytorch
GitHub 中提及
abman23/pmnet
pytorch
GitHub 中提及
nathanlem1/igae-net
pytorch
GitHub 中提及
weiwang31/icemamba
pytorch
GitHub 中提及
xiaohu2015/swint_detectron2
pytorch
GitHub 中提及
IMvision12/keras-vision-models
pytorch
GitHub 中提及
fogfog2/packnet
pytorch
GitHub 中提及
huggingface/transformers
pytorch
GitHub 中提及
canerozer/qct
pytorch
GitHub 中提及
AntixK/PyTorch-Model-Compare
pytorch
GitHub 中提及
facebookresearch/hiera
pytorch
GitHub 中提及
DarshanDeshpande/jax-models
jax
GitHub 中提及
Myyyr/transseg2d
pytorch
GitHub 中提及
WangFeng18/Swin-Transformer
pytorch
GitHub 中提及
holdfire/CLS
pytorch
GitHub 中提及
YongWookHa/swin-transformer-ocr
pytorch
GitHub 中提及
innat/VideoSwin
tf
GitHub 中提及
ayanglab/swinganmr
pytorch
GitHub 中提及
SforAiDl/vformer
pytorch
GitHub 中提及
open-edge-platform/geti
pytorch
GitHub 中提及
befallenStar/molecularAtt
pytorch
GitHub 中提及
yangyangxu0/demt
pytorch
GitHub 中提及
holdfire/FAS
pytorch
GitHub 中提及
LiWentomng/OrientedRepPoints
pytorch
GitHub 中提及

基准测试

基准方法指标
image-classification-on-imagenetSwin-B
GFLOPs: 47
Number of params: 88M
Top 1 Accuracy: 86.4%
image-classification-on-imagenetSwin-L
GFLOPs: 103.9
Number of params: 197M
Top 1 Accuracy: 87.3%
image-classification-on-imagenetSwin-T
GFLOPs: 4.5
Number of params: 29M
Top 1 Accuracy: 81.3%
image-classification-on-omnibenchmarkSwinTransformer
Average Top-1 Accuracy: 46.4
instance-segmentation-on-cocoSwin-L (HTC++, multi scale)
mask AP: 51.1
instance-segmentation-on-cocoSwin-L (HTC++, single scale)
mask AP: 50.2
instance-segmentation-on-coco-minivalSwin-L (HTC++, multi scale)
mask AP: 50.4
instance-segmentation-on-coco-minivalSwin-L (HTC++, single scale)
mask AP: 49.5
instance-segmentation-on-occluded-cocoSwin-S + Mask R-CNN
Mean Recall: 61.14
instance-segmentation-on-occluded-cocoSwin-T + Mask R-CNN
Mean Recall: 58.81
instance-segmentation-on-occluded-cocoSwin-B + Cascade Mask R-CNN
Mean Recall: 62.90
instance-segmentation-on-separated-cocoSwin-B + Cascade Mask R-CNN
Mean Recall: 36.31
instance-segmentation-on-separated-cocoSwin-S + Mask R-CNN
Mean Recall: 33.67
instance-segmentation-on-separated-cocoSwin-T + Mask R-CNN
Mean Recall: 31.94
object-detection-on-cocoSwin-L (HTC++, single scale)
box mAP: 57.7
object-detection-on-cocoSwin-L (HTC++, multi scale)
box mAP: 58.7
object-detection-on-coco-minivalSwin-L (HTC++, single scale)
box AP: 57.1
object-detection-on-coco-minivalSwin-L (HTC++, multi scale)
box AP: 58
semantic-segmentation-on-ade20kSwin-B (UperNet, ImageNet-1k pretrain)
Validation mIoU: 49.7
semantic-segmentation-on-ade20kSwin-L (UperNet, ImageNet-22k pretrain)
Test Score: 62.8
Validation mIoU: 53.50
semantic-segmentation-on-ade20k-valSwin-L (UperNet, ImageNet-22k pretrain)
mIoU: 53.5
semantic-segmentation-on-ade20k-valSwin-B (UperNet, ImageNet-1k pretrain)
mIoU: 49.7
semantic-segmentation-on-foodseg103Swin-Transformer (Swin-Small)
mIoU: 41.6
thermal-image-segmentation-on-mfn-datasetSwinT
mIOU: 49.0

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程
即用型 GPU
最优价格
立即开始

Hyper Newsletters

订阅我们的最新资讯
我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新
邮件发送服务由 MailChimp 提供