HyperAIHyperAI

Command Palette

Search for a command to run...

Training data-efficient image transformers & distillation through attention

Hugo Touvron Matthieu Cord Matthijs Douze Francisco Massa Alexandre Sablayrolles Hervé Jégou

Abstract

Recently, neural networks purely based on attention were shown to addressimage understanding tasks such as image classification. However, these visualtransformers are pre-trained with hundreds of millions of images using anexpensive infrastructure, thereby limiting their adoption. In this work, we produce a competitive convolution-free transformer bytraining on Imagenet only. We train them on a single computer in less than 3days. Our reference vision transformer (86M parameters) achieves top-1 accuracyof 83.1% (single-crop evaluation) on ImageNet with no external data. More importantly, we introduce a teacher-student strategy specific totransformers. It relies on a distillation token ensuring that the studentlearns from the teacher through attention. We show the interest of thistoken-based distillation, especially when using a convnet as a teacher. Thisleads us to report results competitive with convnets for both Imagenet (wherewe obtain up to 85.2% accuracy) and when transferring to other tasks. We shareour code and models.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp