HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

All Tokens Matter: Token Labeling for Training Better Vision Transformers

Jiang Zihang ; Hou Qibin ; Yuan Li ; Zhou Daquan ; Shi Yujun ; Jin Xiaojie ; Wang Anran ; Feng Jiashi

All Tokens Matter: Token Labeling for Training Better Vision
  Transformers

Abstract

In this paper, we present token labeling -- a new training objective fortraining high-performance vision transformers (ViTs). Different from thestandard training objective of ViTs that computes the classification loss on anadditional trainable class token, our proposed one takes advantage of all theimage patch tokens to compute the training loss in a dense manner.Specifically, token labeling reformulates the image classification problem intomultiple token-level recognition problems and assigns each patch token with anindividual location-specific supervision generated by a machine annotator.Experiments show that token labeling can clearly and consistently improve theperformance of various ViT models across a wide spectrum. For a visiontransformer with 26M learnable parameters serving as an example, with tokenlabeling, the model can achieve 84.4% Top-1 accuracy on ImageNet. The resultcan be further increased to 86.4% by slightly scaling the model size up to150M, delivering the minimal-sized model among previous models (250M+) reaching86%. We also show that token labeling can clearly improve the generalizationcapability of the pre-trained models on downstream tasks with dense prediction,such as semantic segmentation. Our code and all the training details will bemade publicly available at https://github.com/zihangJiang/TokenLabeling.

Code Repositories

sail-sg/dualformer
pytorch
Mentioned in GitHub
naver-ai/vidt
pytorch
Mentioned in GitHub
zhoudaquan/Refiner_ViT
pytorch
Mentioned in GitHub
catalpaaa/demansia
pytorch
Mentioned in GitHub
zihangJiang/TokenLabeling
Official
pytorch
Mentioned in GitHub
flytocc/TokenLabeling-paddle
paddle
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
efficient-vits-on-imagenet-1k-with-lv-vit-sBase (LV-ViT-S)
GFLOPs: 6.6
Top 1 Accuracy: 83.3
image-classification-on-imagenetLV-ViT-S
GFLOPs: 6.6
Number of params: 26M
Top 1 Accuracy: 83.3%
image-classification-on-imagenetLV-ViT-M
GFLOPs: 16
Number of params: 56M
Top 1 Accuracy: 84.1%
image-classification-on-imagenetLV-ViT-L
GFLOPs: 214.8
Number of params: 151M
Top 1 Accuracy: 86.4%
semantic-segmentation-on-ade20kLV-ViT-L (UperNet, MS)
Params (M): 209
Validation mIoU: 51.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
All Tokens Matter: Token Labeling for Training Better Vision Transformers | Papers | HyperAI