8 months ago

Abstract

Token compression aims to speed up large-scale vision transformers (e.g.ViTs) by pruning (dropping) or merging tokens. It is an important butchallenging task. Although recent advanced approaches achieved great success,they need to carefully handcraft a compression rate (i.e. number of tokens toremove), which is tedious and leads to sub-optimal performance. To tackle thisproblem, we propose Differentiable Compression Rate (DiffRate), a novel tokencompression method that has several appealing properties prior arts do nothave. First, DiffRate enables propagating the loss function's gradient onto thecompression ratio, which is considered as a non-differentiable hyperparameterin previous work. In this case, different layers can automatically learndifferent compression rates layer-wisely without extra overhead. Second, tokenpruning and merging can be naturally performed simultaneously in DiffRate,while they were isolated in previous works. Third, extensive experimentsdemonstrate that DiffRate achieves state-of-the-art performance. For example,by applying the learned layer-wise compression rates to an off-the-shelf ViT-H(MAE) model, we achieve a 40% FLOPs reduction and a 1.5x throughputimprovement, with a minor accuracy drop of 0.16% on ImageNet withoutfine-tuning, even outperforming previous methods with fine-tuning. Codes andmodels are available at https://github.com/OpenGVLab/DiffRate.

Source PDF View Code