5 months ago

MixPro: Data Augmentation with MaskMix and Progressive Attention Labeling for Vision Transformer

Zhao Qihao ; Huang Yangyu ; Hu Wei ; Zhang Fan ; Liu Jun

Abstract

The recently proposed data augmentation TransMix employs attention labels tohelp visual transformers (ViT) achieve better robustness and performance.However, TransMix is deficient in two aspects: 1) The image cropping method ofTransMix may not be suitable for ViTs. 2) At the early stage of training, themodel produces unreliable attention maps. TransMix uses unreliable attentionmaps to compute mixed attention labels that can affect the model. To addressthe aforementioned issues, we propose MaskMix and Progressive AttentionLabeling (PAL) in image and label space, respectively. In detail, from theperspective of image space, we design MaskMix, which mixes two images based ona patch-like grid mask. In particular, the size of each mask patch isadjustable and is a multiple of the image patch size, which ensures each imagepatch comes from only one image and contains more global contents. From theperspective of label space, we design PAL, which utilizes a progressive factorto dynamically re-weight the attention weights of the mixed attention label.Finally, we combine MaskMix and Progressive Attention Labeling as our new dataaugmentation method, named MixPro. The experimental results show that ourmethod can improve various ViT-based models at scales on ImageNetclassification (73.8\% top-1 accuracy based on DeiT-T for 300 epochs). Afterbeing pre-trained with MixPro on ImageNet, the ViT-based models alsodemonstrate better transferability to semantic segmentation, object detection,and instance segmentation. Furthermore, compared to TransMix, MixPro also showsstronger robustness on several benchmarks. The code is available athttps://github.com/fistyee/MixPro.

Code Repositories

fistyee/mixpro

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
data-augmentation-on-imagenet	DeiT-S (+MixPro)	Accuracy (%): 81.3
data-augmentation-on-imagenet	DeiT-T (+MixPro)	Accuracy (%): 73.8
data-augmentation-on-imagenet	DeiT-B (+MixPro)	Accuracy (%): 82.9
image-classification-on-imagenet	PVT-T (+MixPro)	Top 1 Accuracy: 76.7%
image-classification-on-imagenet	DeiT-T (+MixPro)	Top 1 Accuracy: 73.8%
image-classification-on-imagenet	DeiT-B (+MixPro)	Top 1 Accuracy: 82.9%
image-classification-on-imagenet	CaiT-XXS (+MixPro)	Top 1 Accuracy: 80.6%
image-classification-on-imagenet	PVT-M (+MixPro)	Top 1 Accuracy: 82.7%
image-classification-on-imagenet	PVT-S (+MixPro)	Top 1 Accuracy: 81.2%
image-classification-on-imagenet	CA-Swin-S (+MixPro)	Top 1 Accuracy: 83.7%
image-classification-on-imagenet	CA-Swin-T (+MixPro)	Top 1 Accuracy: 82.8%
image-classification-on-imagenet	XCiT-M (+MixPro)	Top 1 Accuracy: 84.1%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette