5 months ago

Training data-efficient image transformers & distillation through attention

Touvron Hugo ; Cord Matthieu ; Douze Matthijs ; Massa Francisco ; Sablayrolles Alexandre ; Jégou Hervé

Abstract

Recently, neural networks purely based on attention were shown to addressimage understanding tasks such as image classification. However, these visualtransformers are pre-trained with hundreds of millions of images using anexpensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer bytraining on Imagenet only. We train them on a single computer in less than 3days. Our reference vision transformer (86M parameters) achieves top-1 accuracyof 83.1% (single-crop evaluation) on ImageNet with no external data. More importantly, we introduce a teacher-student strategy specific totransformers. It relies on a distillation token ensuring that the studentlearns from the teacher through attention. We show the interest of thistoken-based distillation, especially when using a convnet as a teacher. Thisleads us to report results competitive with convnets for both Imagenet (wherewe obtain up to 85.2% accuracy) and when transferring to other tasks. We shareour code and models.

Code Repositories

omihub777/vit-cifar

pytorch

Mentioned in GitHub

UdbhavPrasad072300/Transformer-Implementations

pytorch

Mentioned in GitHub

PaddlePaddle/PLSC/tree/master/task/classification/deit

paddle

gatech-eic/vitcod

pytorch

Mentioned in GitHub

liuxingwt/CLS

pytorch

Mentioned in GitHub

BR-IDL/PaddleViT/blob/main/image_classification/DeiT

paddle

rwightman/pytorch-image-models

pytorch

Mentioned in GitHub

zhuhanqing/lightening-transformer

pytorch

Mentioned in GitHub

bshantam97/Attention_Based_Networks

pytorch

Mentioned in GitHub

smu-ivpl/DeepfakeDetection

pytorch

Mentioned in GitHub

ttt496/vit-pytorch

pytorch

Mentioned in GitHub

aiot-mlsys-lab/famba-v

pytorch

Mentioned in GitHub

code-implementation1/Code1/tree/main/DeiT

mindspore

PaddlePaddle/PaddleClas

paddle

TACJu/TransFG

pytorch

Mentioned in GitHub

skchen1993/TrangFG

pytorch

Mentioned in GitHub

facebookresearch/deit

Official

pytorch

Mentioned in GitHub

alibaba/EasyCV

pytorch

IMvision12/keras-vision-models

pytorch

Mentioned in GitHub

mindspore-courses/External-Attention-MindSpore/blob/main/model/backbone/DeiT.py

mindspore

asrafulashiq/deit-custom

pytorch

Mentioned in GitHub

wangyz1608/knowledge-distillation-via-nd

pytorch

Mentioned in GitHub

open-edge-platform/training_extensions

pytorch

huggingface/transformers

pytorch

Mentioned in GitHub

Burf/VisionTransformer-Tensorflow2

Mentioned in GitHub

https://gitlab.com/birder/birder

pytorch

jacobgil/vit-explain

pytorch

Mentioned in GitHub

nus-hpc-ai-lab/dyvm

pytorch

Mentioned in GitHub

holdfire/CLS

pytorch

Mentioned in GitHub

s-chh/patchrot

pytorch

Mentioned in GitHub

Oguzhanercan/Vision-Transformers

alessiomora/unlearning_fl

Mentioned in GitHub

moein-shariatnia/Pix2Seq

pytorch

Mentioned in GitHub

open-edge-platform/geti

pytorch

Mentioned in GitHub

affjljoo3581/deit3-jax

jax

Mentioned in GitHub

ahmedelmahy/myownvit

pytorch

Mentioned in GitHub

cogtoolslab/physics-benchmarking-neurips2021

Mentioned in GitHub

holdfire/FAS

pytorch

Mentioned in GitHub

tianhai123/vit-pytorch

pytorch

Mentioned in GitHub

hustvl/vim

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
document-image-classification-on-rvl-cdip	DeiT-B	Accuracy: 90.32% Parameters: 87M
document-layout-analysis-on-publaynet-val	DeiT-B	Figure: 0.957 List: 0.921 Overall: 0.932 Table: 0.972 Text: 0.934 Title: 0.874
efficient-vits-on-imagenet-1k-with-deit-s	Base (DeiT-S)	GFLOPs: 4.6 Top 1 Accuracy: 79.8
efficient-vits-on-imagenet-1k-with-deit-t	Base (DeiT-T)	GFLOPs: 1.2 Top 1 Accuracy: 72.2
fine-grained-image-classification-on-oxford	DeiT-B	Accuracy: 98.8% PARAMS: 86M
fine-grained-image-classification-on-stanford	DeiT-B	Accuracy: 93.3% PARAMS: 86M
image-classification-on-cifar-10	DeiT-B	Percentage correct: 99.1
image-classification-on-cifar-100	DeiT-B	PARAMS: 86M Percentage correct: 90.8
image-classification-on-flowers-102	DeiT-B	Accuracy: 98.8% PARAMS: 86M
image-classification-on-imagenet	DeiT-B	Number of params: 86M Top 1 Accuracy: 84.2%
image-classification-on-imagenet	DeiT-B 384	Hardware Burden: Number of params: 87M Operations per network pass: Top 1 Accuracy: 85.2%
image-classification-on-imagenet	DeiT-B	Number of params: 5M Top 1 Accuracy: 76.6%
image-classification-on-imagenet	DeiT-B	Number of params: 22M Top 1 Accuracy: 82.6%
image-classification-on-imagenet-real	DeiT-Ti	Accuracy: 82.1% Params: 5M
image-classification-on-imagenet-real	DeiT-B	Accuracy: 88.7% Params: 86M
image-classification-on-imagenet-real	DeiT-S	Accuracy: 86.8% Params: 22M
image-classification-on-imagenet-real	DeiT-B-384	Accuracy: 89.3% Params: 86M
image-classification-on-inaturalist-2018	DeiT-B	Top-1 Accuracy: 79.5%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Training data-efficient image transformers & distillation through attention

Touvron Hugo ; Cord Matthieu ; Douze Matthijs ; Massa Francisco ; Sablayrolles Alexandre ; J&#xe9;gou Herv&#xe9;

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters

Touvron Hugo ; Cord Matthieu ; Douze Matthijs ; Massa Francisco ; Sablayrolles Alexandre ; Jégou Hervé