HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

PPT: Token Pruning and Pooling for Efficient Vision Transformers

Wu Xinjian ; Zeng Fanhu ; Wang Xiudong ; Chen Xinghao

PPT: Token Pruning and Pooling for Efficient Vision Transformers

Abstract

Vision Transformers (ViTs) have emerged as powerful models in the field ofcomputer vision, delivering superior performance across various vision tasks.However, the high computational complexity poses a significant barrier to theirpractical applications in real-world scenarios. Motivated by the fact that notall tokens contribute equally to the final predictions and fewer tokens bringless computational cost, reducing redundant tokens has become a prevailingparadigm for accelerating vision transformers. However, we argue that it is notoptimal to either only reduce inattentive redundancy by token pruning, or onlyreduce duplicative redundancy by token merging. To this end, in this paper wepropose a novel acceleration framework, namely token Pruning & PoolingTransformers (PPT), to adaptively tackle these two types of redundancy indifferent layers. By heuristically integrating both token pruning and tokenpooling techniques in ViTs without additional trainable parameters, PPTeffectively reduces the model complexity while maintaining its predictiveaccuracy. For example, PPT reduces over 37% FLOPs and improves the throughputby over 45% for DeiT-S without any accuracy drop on the ImageNet dataset. Thecode is available at https://github.com/xjwu1024/PPT andhttps://github.com/mindspore-lab/models/

Code Repositories

xjwu1024/PPT
Official
pytorch
Mentioned in GitHub
mindspore-lab/models
Official
mindspore

Benchmarks

BenchmarkMethodologyMetrics
efficient-vits-on-imagenet-1k-with-deit-sPPT
GFLOPs: 2.9
Top 1 Accuracy: 79.8
efficient-vits-on-imagenet-1k-with-deit-tPPT
GFLOPs: 0.8
Top 1 Accuracy: 72.1
efficient-vits-on-imagenet-1k-with-lv-vit-sPPT
GFLOPs: 4.6
Top 1 Accuracy: 83.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
PPT: Token Pruning and Pooling for Efficient Vision Transformers | Papers | HyperAI