HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

Wei Siyuan ; Ye Tianzhu ; Zhang Shen ; Tang Yao ; Liang Jiajun

Joint Token Pruning and Squeezing Towards More Aggressive Compression of
  Vision Transformers

Abstract

Although vision transformers (ViTs) have shown promising results in variouscomputer vision tasks recently, their high computational cost limits theirpractical applications. Previous approaches that prune redundant tokens havedemonstrated a good trade-off between performance and computation costs.Nevertheless, errors caused by pruning strategies can lead to significantinformation loss. Our quantitative experiments reveal that the impact of prunedtokens on performance should be noticeable. To address this issue, we propose anovel joint Token Pruning & Squeezing module (TPS) for compressing visiontransformers with higher efficiency. Firstly, TPS adopts pruning to get thereserved and pruned subsets. Secondly, TPS squeezes the information of prunedtokens into partial reserved tokens via the unidirectional nearest-neighbormatching and similarity-based fusing steps. Compared to state-of-the-artmethods, our approach outperforms them under all token pruning intensities.Especially while shrinking DeiT-tiny&small computational budgets to 35%, itimproves the accuracy by 1%-6% compared with baselines on ImageNetclassification. The proposed method can accelerate the throughput of DeiT-smallbeyond DeiT-tiny, while its accuracy surpasses DeiT-tiny by 4.78%. Experimentson various transformers demonstrate the effectiveness of our method, whileanalysis experiments prove our higher robustness to the errors of the tokenpruning policy. Code is available athttps://github.com/megvii-research/TPS-CVPR2023.

Code Repositories

megvii-research/tps-cvpr2023
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
efficient-vits-on-imagenet-1k-with-deit-seTPS
GFLOPs: 3.0
Top 1 Accuracy: 79.7
efficient-vits-on-imagenet-1k-with-deit-sdTPS
GFLOPs: 3.0
Top 1 Accuracy: 80.1
efficient-vits-on-imagenet-1k-with-deit-teTPS
GFLOPs: 0.8
Top 1 Accuracy: 72.3
efficient-vits-on-imagenet-1k-with-deit-tdTPS
GFLOPs: 0.8
Top 1 Accuracy: 72.9
efficient-vits-on-imagenet-1k-with-lv-vit-seTPS
GFLOPs: 3.8
Top 1 Accuracy: 82.5
efficient-vits-on-imagenet-1k-with-lv-vit-sdTPS
GFLOPs: 3.8
Top 1 Accuracy: 82.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers | Papers | HyperAI