Alexey DosovitskiyLucas BeyerAlexander KolesnikovDirk WeissenbornXiaohua ZhaiThomas UnterthinerMostafa DehghaniMatthias MindererGeorg HeigoldSylvain GellyJakob UszkoreitNeil Houlsby

摘要
尽管Transformer架构已成为自然语言处理任务的默认标准,其在计算机视觉领域的应用仍较为有限。在视觉任务中,注意力机制通常与卷积神经网络(CNN)结合使用,或用于替代CNN的某些组件,同时保留其整体结构。本文表明,这种对CNN的依赖并非必需;仅将纯Transformer直接应用于图像块序列,即可在图像分类任务中取得优异表现。当在大规模数据上进行预训练,并迁移至多个中等规模或小型图像识别基准(如ImageNet、CIFAR-100、VTAB等)时,视觉Transformer(Vision Transformer, ViT)的表现优于当前最先进的卷积神经网络,同时训练所需计算资源显著减少。
代码仓库
rayanramoul/Visual-Transformer-PyTorch
pytorch
GitHub 中提及
UdbhavPrasad072300/Transformer-Implementations
pytorch
GitHub 中提及
Thanusan19/Vision_Transformer
jax
GitHub 中提及
YanYan0716/vision_transform
tf
GitHub 中提及
ludics/ViT-Retri
pytorch
GitHub 中提及
kornia/kornia
pytorch
SupreethRao99/VisionTransformer
pytorch
GitHub 中提及
quanmario0311/ViT_PyTorch
pytorch
GitHub 中提及
AlifAshrafee/ViT-pytorch-for-Cooking-State-Recognition
pytorch
GitHub 中提及
haiyang-w/git
pytorch
GitHub 中提及
mtancak1/PyTorch-ViT-Visual-Transformer
pytorch
GitHub 中提及
ruiqirichard/eegeyenet-vit
pytorch
GitHub 中提及
james77777778/keras-image-models
pytorch
GitHub 中提及
KiUngSong/Vision
pytorch
GitHub 中提及
DavidLandup0/deepvision
pytorch
nima1999nikkhah/ViT-Hybrid
pytorch
GitHub 中提及
timH6502/VisionTransformer-PyTorch
pytorch
GitHub 中提及
liuxingwt/CLS
pytorch
GitHub 中提及
konstantinos-p/image_classification_SOTA
pytorch
GitHub 中提及
qiaopTDUN/mae-repo
pytorch
GitHub 中提及
sayannath/ViT-Image-Classification
GitHub 中提及
SHI-Labs/Compact-Transformers
pytorch
GitHub 中提及
faustomorales/vit-keras
tf
GitHub 中提及
asarigun/TransGAN
pytorch
GitHub 中提及
PaddlePaddle/PASSL
paddle
shahrukhx01/ocr-test
pytorch
GitHub 中提及
charchit7/Using_Transoformers
pytorch
GitHub 中提及
rwightman/pytorch-image-models
pytorch
GitHub 中提及
BaiqiangGit/15minCode
pytorch
GitHub 中提及
wangguanan/light-reid
pytorch
GitHub 中提及
jiangtaoxie/So-ViT
pytorch
GitHub 中提及
naver-ai/pflayer
pytorch
GitHub 中提及
SrinjaySarkar/ViT
pytorch
GitHub 中提及
Westlake-AI/openmixup
pytorch
GitHub 中提及
Ugenteraan/Masked-AutoEncoder-PyTorch
pytorch
GitHub 中提及
bshantam97/Attention_Based_Networks
pytorch
GitHub 中提及
smu-ivpl/DeepfakeDetection
pytorch
GitHub 中提及
lucidrains/vit-pytorch
pytorch
ZhouDaShan123/vit
mindspore
seujung/pytorch-vit
pytorch
The-AI-Summer/self_attention
pytorch
jaketae/mlp-mixer
pytorch
GitHub 中提及
PaddlePaddle/PaddleClas
paddle
innat/LearnedResizer-Vision-Transformer
tf
GitHub 中提及
conceptofmind/ViT-haiku
jax
GitHub 中提及
Julien-pour/music_classifcation
pytorch
GitHub 中提及
gnoses/ViT_examples
pytorch
GitHub 中提及
TACJu/TransFG
pytorch
GitHub 中提及
kingcong/vit
mindspore
GitHub 中提及
gupta-abhay/ViT
pytorch
skchen1993/TrangFG
pytorch
GitHub 中提及
BrianPulfer/PapersReimplementations
pytorch
GitHub 中提及
septmars/DL
pytorch
gimme1dollar/vision-transformer
GitHub 中提及
Abdulrahman-Adel/Real-Life-Violence-Detection
tf
GitHub 中提及
UdbhavPrasad072300/Transformer-Implementation
pytorch
GitHub 中提及
ra1ph2/Vision-Transformer
pytorch
ashishpatel26/Vision-Transformer-Keras-Tensorflow-Pytorch-Examples
pytorch
GitHub 中提及
tw-yuhsi/a-new-perspective-for-shuttlecock-hitting-event-detection
pytorch
GitHub 中提及
BebDong/MXNetSeg
mxnet
GitHub 中提及
KatherLab/HIA
pytorch
GitHub 中提及
facebookresearch/vissl
pytorch
GitHub 中提及
drumpt/ViT
pytorch
GitHub 中提及
google-research/vision_transformer
官方
jax
GitHub 中提及
martinsbruveris/tensorflow-image-models
tf
GitHub 中提及
dispink/xpt
pytorch
GitHub 中提及
alibaba/EasyCV
pytorch
kakaobrain/coyo-dataset
pytorch
GitHub 中提及
IMvision12/keras-vision-models
pytorch
GitHub 中提及
open-mmlab/mmclassification
pytorch
alililia/vit_base_GPU
mindspore
GitHub 中提及
sangHa0411/VIT
pytorch
GitHub 中提及
wish44165/A-New-Perspective-for-Shuttlecock-Hitting-Event-Detection
pytorch
GitHub 中提及
sneakatyou/ViT-Tensorflow-2.0
tf
GitHub 中提及
stevenwalton/scs-cct
pytorch
GitHub 中提及
huggingface/transformers
pytorch
GitHub 中提及
04RR/SOTA-Vision
pytorch
GitHub 中提及
YousefGamal220/Vision-Transformers
pytorch
GitHub 中提及
Burf/VisionTransformer-Tensorflow2
tf
GitHub 中提及
junyongyou/triq
pytorch
GitHub 中提及
nachiket273/Vision_transformer_pytorch
pytorch
GitHub 中提及
alililia/vit_base_Ascend
mindspore
GitHub 中提及
facebookresearch/hiera
pytorch
GitHub 中提及
Mind23-2/MindCode-89
mindspore
GitHub 中提及
mtancak/PyTorch-ViT-Visual-Transformer
pytorch
GitHub 中提及
jacobgil/vit-explain
pytorch
GitHub 中提及
ttt496/VisionTransformer
jax
GitHub 中提及
HyeonhoonLee/MAIC2021_Sleep
pytorch
GitHub 中提及
sliao-mi-luku/Galaxy-Zoo-Classification
pytorch
GitHub 中提及
purbayankar/Hyperspectral-Vision-Transformer
pytorch
GitHub 中提及
davisking/dlib-models
GitHub 中提及
TheTensorDude/vision_transformer_tf
tf
GitHub 中提及
gmum/dl-mo-2021
GitHub 中提及
Mayurji/Image-Classification-PyTorch
pytorch
GitHub 中提及
holdfire/CLS
pytorch
GitHub 中提及
Kevinz-code/CSRA
pytorch
GitHub 中提及
Aedelon/ViT-PyTorch-Replication
pytorch
GitHub 中提及
staghado/vit.cpp
pytorch
GitHub 中提及
s-chh/pytorch-scratch-vision-transformer-vit
pytorch
GitHub 中提及
mahmoodlab/hipt
pytorch
GitHub 中提及
Ugenteraan/Vanilla-ViT
pytorch
GitHub 中提及
DominikBatic/EndoViT
pytorch
GitHub 中提及
tahmid0007/VisionTransformer
pytorch
GitHub 中提及
SforAiDl/vformer
pytorch
GitHub 中提及
explainingai-code/VIT-Pytorch
pytorch
GitHub 中提及
meowbutlerdev/ViT
pytorch
GitHub 中提及
nasa-impact/hls-foundation-os
pytorch
GitHub 中提及
Mind23-2/MindCode-1
paddle
GitHub 中提及
nachiket273/VisTrans
pytorch
GitHub 中提及
zpc-666/Paddle-R-Drop
paddle
GitHub 中提及
modeeric/eegvit-tcnet
pytorch
GitHub 中提及
nateraw/lightning-vision-transformer
pytorch
GitHub 中提及
towhee-io/towhee
pytorch
protonx-engineering/vit
tf
GitHub 中提及
jeonsworld/ViT-pytorch
pytorch
GitHub 中提及
holdfire/FAS
pytorch
GitHub 中提及
asyml/vision-transformer-pytorch
jax
GitHub 中提及
jo1jun/Vision_Transformer
pytorch
GitHub 中提及
lukas-blecher/LaTeX-OCR
pytorch
GitHub 中提及
woctezuma/steam-CLIP
GitHub 中提及
tintn/vision-transformer-from-scratch
pytorch
GitHub 中提及
smitheric95/MoCoViT-PyTorch
pytorch
GitHub 中提及
uygarkurt/ViT-PyTorch
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 | 
|---|---|---|
| domain-generalization-on-vizwiz | ViT-8/B-224 | Accuracy - Clean Images: 450 | 
| domain-generalization-on-vizwiz | ViT-16/L-224 | Accuracy - All Images: 49 | 
| fine-grained-image-classification-on-oxford-2 | ViT-B/16 | Top-1 Error Rate: 6.2% | 
| image-classification-on-cifar-10 | ViT-H/14 | Percentage correct: 99.5 | 
| image-classification-on-cifar-10 | ViT-L/16 | Percentage correct: 99.42 | 
| image-classification-on-flowers-102 | - | Accuracy: 99.68 | 
| image-classification-on-imagenet | ViT-L/16 | Top 1 Accuracy: 87.76% | 
| image-classification-on-imagenet | ViT-Large | Top 1 Accuracy: 24% | 
| image-classification-on-imagenet | - | Top 5 Accuracy: 23.72 | 
| image-classification-on-imagenet | ViT-H/14 | Top 1 Accuracy: 88.55% | 
| image-classification-on-objectnet | ViT-H/14 | Top-5 Accuracy: 82.1 |