3 months ago

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly

Abstract

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

Code Repositories

OML-Team/open-metric-learning

pytorch

rayanramoul/Visual-Transformer-PyTorch

pytorch

Mentioned in GitHub

UdbhavPrasad072300/Transformer-Implementations

pytorch

Mentioned in GitHub

soumik12345/Vision-Transformer

Thanusan19/Vision_Transformer

jax

Mentioned in GitHub

YanYan0716/vision_transform

Mentioned in GitHub

ludics/ViT-Retri

pytorch

Mentioned in GitHub

kornia/kornia

pytorch

SupreethRao99/VisionTransformer

pytorch

Mentioned in GitHub

quanmario0311/ViT_PyTorch

pytorch

Mentioned in GitHub

AlifAshrafee/ViT-pytorch-for-Cooking-State-Recognition

pytorch

Mentioned in GitHub

haiyang-w/git

pytorch

Mentioned in GitHub

mtancak1/PyTorch-ViT-Visual-Transformer

pytorch

Mentioned in GitHub

mashaan14/VisionTransformer-MNIST

pytorch

ruiqirichard/eegeyenet-vit

pytorch

Mentioned in GitHub

james77777778/keras-image-models

pytorch

Mentioned in GitHub

KiUngSong/Vision

pytorch

Mentioned in GitHub

DavidLandup0/deepvision

pytorch

xiuyu0000/papers_with_examples/tree/main/ViT

mindspore

nima1999nikkhah/ViT-Hybrid

pytorch

Mentioned in GitHub

timH6502/VisionTransformer-PyTorch

pytorch

Mentioned in GitHub

liuxingwt/CLS

pytorch

Mentioned in GitHub

konstantinos-p/image_classification_SOTA

pytorch

Mentioned in GitHub

qiaopTDUN/mae-repo

pytorch

Mentioned in GitHub

sayannath/ViT-Image-Classification

Mentioned in GitHub

SHI-Labs/Compact-Transformers

pytorch

Mentioned in GitHub

jankrepl/mildlyoverfitted

jax

faustomorales/vit-keras

Mentioned in GitHub

asarigun/TransGAN

pytorch

Mentioned in GitHub

PaddlePaddle/PASSL

paddle

shahrukhx01/ocr-test

pytorch

Mentioned in GitHub

charchit7/Using_Transoformers

pytorch

Mentioned in GitHub

mdmhriday/vision-transformers

pytorch

keras-team/keras-cv/blob/master/keras_cv/models/vit.py

rwightman/pytorch-image-models

pytorch

Mentioned in GitHub

mindspore-ecosystem/mindcv/blob/main/mindcv/models/vit.py

mindspore

megvii-research/basecls/tree/main/zoo/public/vit

BaiqiangGit/15minCode

pytorch

Mentioned in GitHub

PaddlePaddle/PLSC/tree/master/task/classification/vit

paddle

wangguanan/light-reid

pytorch

Mentioned in GitHub

avinash31d/paper-implementations/tree/main/vit

jiangtaoxie/So-ViT

pytorch

Mentioned in GitHub

naver-ai/pflayer

pytorch

Mentioned in GitHub

SrinjaySarkar/ViT

pytorch

Mentioned in GitHub

Westlake-AI/openmixup

pytorch

Mentioned in GitHub

Ugenteraan/Masked-AutoEncoder-PyTorch

pytorch

Mentioned in GitHub

lukemelas/PyTorch-Pretrained-ViT

pytorch

tuvovan/Vision_Transformer_Keras

bshantam97/Attention_Based_Networks

pytorch

Mentioned in GitHub

smu-ivpl/DeepfakeDetection

pytorch

Mentioned in GitHub

lucidrains/vit-pytorch

pytorch

ZhouDaShan123/vit

mindspore

seujung/pytorch-vit

pytorch

The-AI-Summer/self_attention

pytorch

MindSpore-scientific/code-7/tree/main/VisionTransformer

keras-team/keras-io/blob/master/examples/vision/image_classification_with_vision_transformer.py

facebookresearch/ClassyVision/tree/master/examples/vit

pytorch

jaketae/mlp-mixer

pytorch

Mentioned in GitHub

emla2805/vision-transformer

PaddlePaddle/PaddleClas

paddle

innat/LearnedResizer-Vision-Transformer

Mentioned in GitHub

conceptofmind/ViT-haiku

jax

Mentioned in GitHub

Julien-pour/music_classifcation

pytorch

Mentioned in GitHub

gnoses/ViT_examples

pytorch

Mentioned in GitHub

TACJu/TransFG

pytorch

Mentioned in GitHub

kingcong/vit

mindspore

Mentioned in GitHub

gupta-abhay/ViT

pytorch

skchen1993/TrangFG

pytorch

Mentioned in GitHub

BrianPulfer/PapersReimplementations

pytorch

Mentioned in GitHub

septmars/DL

pytorch

gimme1dollar/vision-transformer

Mentioned in GitHub

Abdulrahman-Adel/Real-Life-Violence-Detection

Mentioned in GitHub

UdbhavPrasad072300/Transformer-Implementation

pytorch

Mentioned in GitHub

ra1ph2/Vision-Transformer

pytorch

ashishpatel26/Vision-Transformer-Keras-Tensorflow-Pytorch-Examples

pytorch

Mentioned in GitHub

tw-yuhsi/a-new-perspective-for-shuttlecock-hitting-event-detection

pytorch

Mentioned in GitHub

BebDong/MXNetSeg

mxnet

Mentioned in GitHub

KatherLab/HIA

pytorch

Mentioned in GitHub

UdbhavPrasad072300/Transformer-Implementation-and-Language-Translation

pytorch

Mentioned in GitHub

facebookresearch/vissl

pytorch

Mentioned in GitHub

drumpt/ViT

pytorch

Mentioned in GitHub

google-research/vision_transformer

Official

jax

Mentioned in GitHub

martinsbruveris/tensorflow-image-models

Mentioned in GitHub

dispink/xpt

pytorch

Mentioned in GitHub

alibaba/EasyCV

pytorch

kakaobrain/coyo-dataset

pytorch

Mentioned in GitHub

IMvision12/keras-vision-models

pytorch

Mentioned in GitHub

open-mmlab/mmclassification

pytorch

alililia/vit_base_GPU

mindspore

Mentioned in GitHub

kamalkraj/Vision-Transformer

sangHa0411/VIT

pytorch

Mentioned in GitHub

wish44165/A-New-Perspective-for-Shuttlecock-Hitting-Event-Detection

pytorch

Mentioned in GitHub

sneakatyou/ViT-Tensorflow-2.0

Mentioned in GitHub

BR-IDL/PaddleViT/blob/main/image_classification/ViT

paddle

mindspore-ai/models/blob/master/research/cv/vit_base/

mindspore

stevenwalton/scs-cct

pytorch

Mentioned in GitHub

pytorch/vision/blob/main/torchvision/models/vision_transformer.py

pytorch

huggingface/transformers

pytorch

Mentioned in GitHub

04RR/SOTA-Vision

pytorch

Mentioned in GitHub

YousefGamal220/Vision-Transformers

pytorch

Mentioned in GitHub

Burf/VisionTransformer-Tensorflow2

Mentioned in GitHub

junyongyou/triq

pytorch

Mentioned in GitHub

https://gitlab.com/birder/birder

pytorch

xiuyu0000/new_papers_codes/tree/main/vit

mindspore

nachiket273/Vision_transformer_pytorch

pytorch

Mentioned in GitHub

mindspore-ai/models/tree/master/official/cv/vit

mindspore

alililia/vit_base_Ascend

mindspore

Mentioned in GitHub

facebookresearch/hiera

pytorch

Mentioned in GitHub

Mind23-2/MindCode-89

mindspore

Mentioned in GitHub

ahmed-alllam/Equinox/blob/main/examples/vision_transformer.ipynb

jax

mtancak/PyTorch-ViT-Visual-Transformer

pytorch

Mentioned in GitHub

arkel23/PyTorch-Pretrained-ViT

pytorch

jacobgil/vit-explain

pytorch

Mentioned in GitHub

ttt496/VisionTransformer

jax

Mentioned in GitHub

HyeonhoonLee/MAIC2021_Sleep

pytorch

Mentioned in GitHub

sliao-mi-luku/Galaxy-Zoo-Classification

pytorch

Mentioned in GitHub

purbayankar/Hyperspectral-Vision-Transformer

pytorch

Mentioned in GitHub

davisking/dlib-models

Mentioned in GitHub

TheTensorDude/vision_transformer_tf

Mentioned in GitHub

gmum/dl-mo-2021

Mentioned in GitHub

Mayurji/Image-Classification-PyTorch

pytorch

Mentioned in GitHub

zer0sh0t/artificial_intelligence/tree/master/vision_models/vision_transformer

pytorch

holdfire/CLS

pytorch

Mentioned in GitHub

labmlai/annotated_deep_learning_paper_implementations

pytorch

Kevinz-code/CSRA

pytorch

Mentioned in GitHub

HzcIrving/DeepLearning_PlayGround/tree/main/VIT

pytorch

Oguzhanercan/Vision-Transformers

Aedelon/ViT-PyTorch-Replication

pytorch

Mentioned in GitHub

staghado/vit.cpp

pytorch

Mentioned in GitHub

s-chh/pytorch-scratch-vision-transformer-vit

pytorch

Mentioned in GitHub

mahmoodlab/hipt

pytorch

Mentioned in GitHub

Ugenteraan/Vanilla-ViT

pytorch

Mentioned in GitHub

DominikBatic/EndoViT

pytorch

Mentioned in GitHub

tahmid0007/VisionTransformer

pytorch

Mentioned in GitHub

SforAiDl/vformer

pytorch

Mentioned in GitHub

mujiyantosvc/Facial-Expression-Recognition-FER-for-Mental-Health-Detection-

pytorch

Mentioned in GitHub

explainingai-code/VIT-Pytorch

pytorch

Mentioned in GitHub

meowbutlerdev/ViT

pytorch

Mentioned in GitHub

nasa-impact/hls-foundation-os

pytorch

Mentioned in GitHub

Mind23-2/MindCode-1

paddle

Mentioned in GitHub

nachiket273/VisTrans

pytorch

Mentioned in GitHub

zpc-666/Paddle-R-Drop

paddle

Mentioned in GitHub

modeeric/eegvit-tcnet

pytorch

Mentioned in GitHub

affjljoo3581/deit3-jax

jax

nateraw/lightning-vision-transformer

pytorch

Mentioned in GitHub

towhee-io/towhee

pytorch

protonx-engineering/vit

Mentioned in GitHub

jeonsworld/ViT-pytorch

pytorch

Mentioned in GitHub

holdfire/FAS

pytorch

Mentioned in GitHub

asyml/vision-transformer-pytorch

jax

Mentioned in GitHub

leemsaebom/attention-guided-cam-visual-explanations-of-vision-transformer-guided-by-self-attention

pytorch

Mentioned in GitHub

jo1jun/Vision_Transformer

pytorch

Mentioned in GitHub

lukas-blecher/LaTeX-OCR

pytorch

Mentioned in GitHub

woctezuma/steam-CLIP

Mentioned in GitHub

tintn/vision-transformer-from-scratch

pytorch

Mentioned in GitHub

uzi0espil/research-papers-implementation/tree/master/Vision%20Transformer

smitheric95/MoCoViT-PyTorch

pytorch

Mentioned in GitHub

uygarkurt/ViT-PyTorch

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
domain-generalization-on-vizwiz	ViT-8/B-224	Accuracy - Clean Images: 450
domain-generalization-on-vizwiz	ViT-16/L-224	Accuracy - All Images: 49
fine-grained-image-classification-on-oxford-2	ViT-B/16	Top-1 Error Rate: 6.2%
image-classification-on-cifar-10	ViT-H/14	Percentage correct: 99.5
image-classification-on-cifar-10	ViT-L/16	Percentage correct: 99.42
image-classification-on-flowers-102	-	Accuracy: 99.68
image-classification-on-imagenet	ViT-L/16	Top 1 Accuracy: 87.76%
image-classification-on-imagenet	ViT-Large	Top 1 Accuracy: 24%
image-classification-on-imagenet	-	Top 5 Accuracy: 23.72
image-classification-on-imagenet	ViT-H/14	Top 1 Accuracy: 88.55%
image-classification-on-objectnet	ViT-H/14	Top-5 Accuracy: 82.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly2 more

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly