3 months ago

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ze Liu Yutong Lin Yue Cao Han Hu Yixuan Wei Zheng Zhang Stephen Lin Baining Guo

Abstract

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with \textbf{S}hifted \textbf{win}dows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification (87.3 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test-dev) and semantic segmentation (53.5 mIoU on ADE20K val). Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones. The hierarchical design and the shifted window approach also prove beneficial for all-MLP architectures. The code and models are publicly available at~\url{https://github.com/microsoft/Swin-Transformer}.

Code Repositories

yangyucheng000/University/tree/main/model-3/swin

mindspore

innat/HybridModel-GradCAM

mzeromiko/vmamba

pytorch

Mentioned in GitHub

jahongir7174/MaskRCNN

pytorch

ustc-imcl/hst-for-compressed-image-sr

pytorch

Mentioned in GitHub

PaddlePaddle/PaddleSeg

paddle

rami0205/ngramswin

pytorch

Mentioned in GitHub

open-mmlab/mmpose

pytorch

Mentioned in GitHub

liuxingwt/CLS

pytorch

Mentioned in GitHub

pytorch/vision

pytorch

USTC-IMCL/HST-for-Compressed-SR

pytorch

Mentioned in GitHub

microsoft/Swin-Transformer

Official

pytorch

Mentioned in GitHub

rwightman/pytorch-image-models

pytorch

Mentioned in GitHub

SwinTransformer/Transformer-SSL

pytorch

Mentioned in GitHub

PaddlePaddle/PASSL

paddle

shellredia/snake-swin-octa

pytorch

Mentioned in GitHub

JIA-HONG-CHU/Swin-Transformer-add-EncNet-DaNet-DraNet-for-semantic-segmentation-on-Statelite-Dataset

pytorch

Mentioned in GitHub

ayanglab/swinmr

pytorch

Mentioned in GitHub

berniwal/swin-transformer-pytorch

pytorch

Mentioned in GitHub

Mind23-2/MindCode-117

mindspore

PaddlePaddle/PaddleClas

paddle

BR-IDL/PaddleViT

paddle

Mentioned in GitHub

Gojay001/toolkit-DeepLearning

pytorch

Mentioned in GitHub

abman23/pmnet

pytorch

Mentioned in GitHub

open-mmlab/mmdetection

pytorch

Burf/SwinTransformer-Tensorflow2

Mentioned in GitHub

nathanlem1/igae-net

pytorch

Mentioned in GitHub

weiwang31/icemamba

pytorch

Mentioned in GitHub

PaddlePaddle/PaddleDetection

paddle

code-implementation1/Code9/tree/main/SwinTransformer

mindspore

xiaohu2015/swint_detectron2

pytorch

Mentioned in GitHub

sayakpaul/swin-transformers-tf

Mentioned in GitHub

martinsbruveris/tensorflow-image-models

Mentioned in GitHub

IMvision12/keras-vision-models

pytorch

Mentioned in GitHub

alibaba/EasyCV

pytorch

open-mmlab/mmclassification

pytorch

fogfog2/packnet

pytorch

Mentioned in GitHub

zhangbo2008/swin-transformer_noted_very_detail

pytorch

Mentioned in GitHub

open-edge-platform/training_extensions

pytorch

megvii-research/basecls/tree/main/zoo/public/swin

huggingface/transformers

pytorch

Mentioned in GitHub

canerozer/qct

pytorch

Mentioned in GitHub

https://gitlab.com/birder/birder

pytorch

AntixK/PyTorch-Model-Compare

pytorch

Mentioned in GitHub

PaddlePaddle/PLSC/tree/master/task/classification/swin

paddle

facebookresearch/hiera

pytorch

Mentioned in GitHub

NEUdeep/Swin-Transformer-Object-Detection

pytorch

Mentioned in GitHub

MS-Mind/MS-Code-02/tree/main/configs/swintransformer

mindspore

Mind23-2/MindCode-155

mindspore

DarshanDeshpande/jax-models

jax

Mentioned in GitHub

DominickZhang/Distillation-Swin-Transformer

pytorch

Mentioned in GitHub

Myyyr/transseg2d

pytorch

Mentioned in GitHub

WangFeng18/Swin-Transformer

pytorch

Mentioned in GitHub

shkarupa-alex/tfswin

yingkaisha/keras-vision-transformer

Mentioned in GitHub

holdfire/CLS

pytorch

Mentioned in GitHub

lyqcom/aaic_swintransformerv2

mindspore

YongWookHa/swin-transformer-ocr

pytorch

Mentioned in GitHub

HzcIrving/DeepLearning_PlayGround/tree/main/Swin-Transformer

pytorch

innat/VideoSwin

Mentioned in GitHub

ayanglab/swinganmr

pytorch

Mentioned in GitHub

SforAiDl/vformer

pytorch

Mentioned in GitHub

mujiyantosvc/Facial-Expression-Recognition-FER-for-Mental-Health-Detection-

pytorch

Mentioned in GitHub

layumi/Person_reID_baseline_pytorch

pytorch

MindCode-4/code-7/tree/main/isr

mindspore

mindspore-ecosystem/mindcv/blob/main/mindcv/models/swin_transformer.py

mindspore

open-edge-platform/geti

pytorch

Mentioned in GitHub

befallenStar/molecularAtt

pytorch

Mentioned in GitHub

yangyangxu0/demt

pytorch

Mentioned in GitHub

towhee-io/towhee

pytorch

holdfire/FAS

pytorch

Mentioned in GitHub

keras-team/keras-io/blob/master/examples/vision/swin_transformers.py

VcampSoldiers/Swin-Transformer-Tensorflow

Mentioned in GitHub

Burf/tfdetection

koechslin/swin-transformer-semantic-segmentation

pytorch

Mentioned in GitHub

LiWentomng/OrientedRepPoints

pytorch

Mentioned in GitHub

rishigami/Swin-Transformer-TF

Mind23-2/MindCode-165

mindspore

shinya7y/UniverseNet

pytorch

mindspore-courses/External-Attention-MindSpore/blob/main/model/backbone/swin_transformer.py

mindspore

Benchmarks

Benchmark	Methodology	Metrics
image-classification-on-imagenet	Swin-B	GFLOPs: 47 Number of params: 88M Top 1 Accuracy: 86.4%
image-classification-on-imagenet	Swin-L	GFLOPs: 103.9 Number of params: 197M Top 1 Accuracy: 87.3%
image-classification-on-imagenet	Swin-T	GFLOPs: 4.5 Number of params: 29M Top 1 Accuracy: 81.3%
image-classification-on-omnibenchmark	SwinTransformer	Average Top-1 Accuracy: 46.4
instance-segmentation-on-coco	Swin-L (HTC++, multi scale)	mask AP: 51.1
instance-segmentation-on-coco	Swin-L (HTC++, single scale)	mask AP: 50.2
instance-segmentation-on-coco-minival	Swin-L (HTC++, multi scale)	mask AP: 50.4
instance-segmentation-on-coco-minival	Swin-L (HTC++, single scale)	mask AP: 49.5
instance-segmentation-on-occluded-coco	Swin-S + Mask R-CNN	Mean Recall: 61.14
instance-segmentation-on-occluded-coco	Swin-T + Mask R-CNN	Mean Recall: 58.81
instance-segmentation-on-occluded-coco	Swin-B + Cascade Mask R-CNN	Mean Recall: 62.90
instance-segmentation-on-separated-coco	Swin-B + Cascade Mask R-CNN	Mean Recall: 36.31
instance-segmentation-on-separated-coco	Swin-S + Mask R-CNN	Mean Recall: 33.67
instance-segmentation-on-separated-coco	Swin-T + Mask R-CNN	Mean Recall: 31.94
object-detection-on-coco	Swin-L (HTC++, single scale)	box mAP: 57.7
object-detection-on-coco	Swin-L (HTC++, multi scale)	box mAP: 58.7
object-detection-on-coco-minival	Swin-L (HTC++, single scale)	box AP: 57.1
object-detection-on-coco-minival	Swin-L (HTC++, multi scale)	box AP: 58
semantic-segmentation-on-ade20k	Swin-B (UperNet, ImageNet-1k pretrain)	Validation mIoU: 49.7
semantic-segmentation-on-ade20k	Swin-L (UperNet, ImageNet-22k pretrain)	Test Score: 62.8 Validation mIoU: 53.50
semantic-segmentation-on-ade20k-val	Swin-L (UperNet, ImageNet-22k pretrain)	mIoU: 53.5
semantic-segmentation-on-ade20k-val	Swin-B (UperNet, ImageNet-1k pretrain)	mIoU: 49.7
semantic-segmentation-on-foodseg103	Swin-Transformer (Swin-Small)	mIoU: 41.6
thermal-image-segmentation-on-mfn-dataset	SwinT	mIOU: 49.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ze Liu Yutong Lin Yue Cao Han Hu Yixuan Wei Zheng Zhang Stephen Lin Baining Guo

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters