HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Ze Liu Yutong Lin Yue Cao Han Hu Yixuan Wei Zheng Zhang Stephen Lin Baining Guo

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Abstract

This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. To address these differences, we propose a hierarchical Transformer whose representation is computed with \textbf{S}hifted \textbf{win}dows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities of Swin Transformer make it compatible with a broad range of vision tasks, including image classification (87.3 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test-dev) and semantic segmentation (53.5 mIoU on ADE20K val). Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones. The hierarchical design and the shifted window approach also prove beneficial for all-MLP architectures. The code and models are publicly available at~\url{https://github.com/microsoft/Swin-Transformer}.

Code Repositories

mzeromiko/vmamba
pytorch
Mentioned in GitHub
ustc-imcl/hst-for-compressed-image-sr
pytorch
Mentioned in GitHub
rami0205/ngramswin
pytorch
Mentioned in GitHub
open-mmlab/mmpose
pytorch
Mentioned in GitHub
liuxingwt/CLS
pytorch
Mentioned in GitHub
USTC-IMCL/HST-for-Compressed-SR
pytorch
Mentioned in GitHub
microsoft/Swin-Transformer
Official
pytorch
Mentioned in GitHub
rwightman/pytorch-image-models
pytorch
Mentioned in GitHub
SwinTransformer/Transformer-SSL
pytorch
Mentioned in GitHub
shellredia/snake-swin-octa
pytorch
Mentioned in GitHub
ayanglab/swinmr
pytorch
Mentioned in GitHub
berniwal/swin-transformer-pytorch
pytorch
Mentioned in GitHub
BR-IDL/PaddleViT
paddle
Mentioned in GitHub
Gojay001/toolkit-DeepLearning
pytorch
Mentioned in GitHub
abman23/pmnet
pytorch
Mentioned in GitHub
nathanlem1/igae-net
pytorch
Mentioned in GitHub
weiwang31/icemamba
pytorch
Mentioned in GitHub
xiaohu2015/swint_detectron2
pytorch
Mentioned in GitHub
sayakpaul/swin-transformers-tf
tf
Mentioned in GitHub
IMvision12/keras-vision-models
pytorch
Mentioned in GitHub
fogfog2/packnet
pytorch
Mentioned in GitHub
huggingface/transformers
pytorch
Mentioned in GitHub
canerozer/qct
pytorch
Mentioned in GitHub
AntixK/PyTorch-Model-Compare
pytorch
Mentioned in GitHub
facebookresearch/hiera
pytorch
Mentioned in GitHub
DarshanDeshpande/jax-models
jax
Mentioned in GitHub
Myyyr/transseg2d
pytorch
Mentioned in GitHub
WangFeng18/Swin-Transformer
pytorch
Mentioned in GitHub
holdfire/CLS
pytorch
Mentioned in GitHub
YongWookHa/swin-transformer-ocr
pytorch
Mentioned in GitHub
innat/VideoSwin
tf
Mentioned in GitHub
ayanglab/swinganmr
pytorch
Mentioned in GitHub
SforAiDl/vformer
pytorch
Mentioned in GitHub
open-edge-platform/geti
pytorch
Mentioned in GitHub
befallenStar/molecularAtt
pytorch
Mentioned in GitHub
yangyangxu0/demt
pytorch
Mentioned in GitHub
holdfire/FAS
pytorch
Mentioned in GitHub
LiWentomng/OrientedRepPoints
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-classification-on-imagenetSwin-B
GFLOPs: 47
Number of params: 88M
Top 1 Accuracy: 86.4%
image-classification-on-imagenetSwin-L
GFLOPs: 103.9
Number of params: 197M
Top 1 Accuracy: 87.3%
image-classification-on-imagenetSwin-T
GFLOPs: 4.5
Number of params: 29M
Top 1 Accuracy: 81.3%
image-classification-on-omnibenchmarkSwinTransformer
Average Top-1 Accuracy: 46.4
instance-segmentation-on-cocoSwin-L (HTC++, multi scale)
mask AP: 51.1
instance-segmentation-on-cocoSwin-L (HTC++, single scale)
mask AP: 50.2
instance-segmentation-on-coco-minivalSwin-L (HTC++, multi scale)
mask AP: 50.4
instance-segmentation-on-coco-minivalSwin-L (HTC++, single scale)
mask AP: 49.5
instance-segmentation-on-occluded-cocoSwin-S + Mask R-CNN
Mean Recall: 61.14
instance-segmentation-on-occluded-cocoSwin-T + Mask R-CNN
Mean Recall: 58.81
instance-segmentation-on-occluded-cocoSwin-B + Cascade Mask R-CNN
Mean Recall: 62.90
instance-segmentation-on-separated-cocoSwin-B + Cascade Mask R-CNN
Mean Recall: 36.31
instance-segmentation-on-separated-cocoSwin-S + Mask R-CNN
Mean Recall: 33.67
instance-segmentation-on-separated-cocoSwin-T + Mask R-CNN
Mean Recall: 31.94
object-detection-on-cocoSwin-L (HTC++, single scale)
box mAP: 57.7
object-detection-on-cocoSwin-L (HTC++, multi scale)
box mAP: 58.7
object-detection-on-coco-minivalSwin-L (HTC++, single scale)
box AP: 57.1
object-detection-on-coco-minivalSwin-L (HTC++, multi scale)
box AP: 58
semantic-segmentation-on-ade20kSwin-B (UperNet, ImageNet-1k pretrain)
Validation mIoU: 49.7
semantic-segmentation-on-ade20kSwin-L (UperNet, ImageNet-22k pretrain)
Test Score: 62.8
Validation mIoU: 53.50
semantic-segmentation-on-ade20k-valSwin-L (UperNet, ImageNet-22k pretrain)
mIoU: 53.5
semantic-segmentation-on-ade20k-valSwin-B (UperNet, ImageNet-1k pretrain)
mIoU: 49.7
semantic-segmentation-on-foodseg103Swin-Transformer (Swin-Small)
mIoU: 41.6
thermal-image-segmentation-on-mfn-datasetSwinT
mIOU: 49.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp