HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

Zhao Qihao ; Huang Yangyu ; Hu Wei ; Zhang Fan ; Liu Jun

MixPro: Data Augmentation with MaskMix and Progressive Attention
  Labeling for Vision Transformer

Abstract

The recently proposed data augmentation TransMix employs attention labels tohelp visual transformers (ViT) achieve better robustness and performance.However, TransMix is deficient in two aspects: 1) The image cropping method ofTransMix may not be suitable for ViTs. 2) At the early stage of training, themodel produces unreliable attention maps. TransMix uses unreliable attentionmaps to compute mixed attention labels that can affect the model. To addressthe aforementioned issues, we propose MaskMix and Progressive AttentionLabeling (PAL) in image and label space, respectively. In detail, from theperspective of image space, we design MaskMix, which mixes two images based ona patch-like grid mask. In particular, the size of each mask patch isadjustable and is a multiple of the image patch size, which ensures each imagepatch comes from only one image and contains more global contents. From theperspective of label space, we design PAL, which utilizes a progressive factorto dynamically re-weight the attention weights of the mixed attention label.Finally, we combine MaskMix and Progressive Attention Labeling as our new dataaugmentation method, named MixPro. The experimental results show that ourmethod can improve various ViT-based models at scales on ImageNetclassification (73.8\% top-1 accuracy based on DeiT-T for 300 epochs). Afterbeing pre-trained with MixPro on ImageNet, the ViT-based models alsodemonstrate better transferability to semantic segmentation, object detection,and instance segmentation. Furthermore, compared to TransMix, MixPro also showsstronger robustness on several benchmarks. The code is available athttps://github.com/fistyee/MixPro.

Code Repositories

fistyee/mixpro
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
data-augmentation-on-imagenetDeiT-S (+MixPro)
Accuracy (%): 81.3
data-augmentation-on-imagenetDeiT-T (+MixPro)
Accuracy (%): 73.8
data-augmentation-on-imagenetDeiT-B (+MixPro)
Accuracy (%): 82.9
image-classification-on-imagenetPVT-T (+MixPro)
Top 1 Accuracy: 76.7%
image-classification-on-imagenetDeiT-T (+MixPro)
Top 1 Accuracy: 73.8%
image-classification-on-imagenetDeiT-B (+MixPro)
Top 1 Accuracy: 82.9%
image-classification-on-imagenetCaiT-XXS (+MixPro)
Top 1 Accuracy: 80.6%
image-classification-on-imagenetPVT-M (+MixPro)
Top 1 Accuracy: 82.7%
image-classification-on-imagenetPVT-S (+MixPro)
Top 1 Accuracy: 81.2%
image-classification-on-imagenetCA-Swin-S (+MixPro)
Top 1 Accuracy: 83.7%
image-classification-on-imagenetCA-Swin-T (+MixPro)
Top 1 Accuracy: 82.8%
image-classification-on-imagenetXCiT-M (+MixPro)
Top 1 Accuracy: 84.1%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer | Papers | HyperAI