Command Palette
Search for a command to run...
Improving Image Clustering with Artifacts Attenuation via Inference-Time Attention Engineering
Nakamura Kazumoto ; Nozawa Yuji ; Lin Yu-Chieh ; Nakata Kengo ; Ng Youyang

Abstract
The goal of this paper is to improve the performance of pretrained VisionTransformer (ViT) models, particularly DINOv2, in image clustering task withoutrequiring re-training or fine-tuning. As model size increases, high-normartifacts anomaly appears in the patches of multi-head attention. We observethat this anomaly leads to reduced accuracy in zero-shot image clustering.These artifacts are characterized by disproportionately large values in theattention map compared to other patch tokens. To address these artifacts, wepropose an approach called Inference-Time Attention Engineering (ITAE), whichmanipulates attention function during inference. Specifically, we identify theartifacts by investigating one of the Query-Key-Value (QKV) patches in themulti-head attention and attenuate their corresponding attention values insidethe pretrained models. ITAE shows improved clustering accuracy on multipledatasets by exhibiting more expressive features in latent space. Our findingshighlight the potential of ITAE as a practical solution for reducing artifactsin pretrained ViT models and improving model performance in clustering taskswithout the need for re-training or fine-tuning.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-clustering-on-cifar-10 | ITAE | ARI: 0.7946 Accuracy: 0.8449 Backbone: ViT-B/14 NMI: 0.8682 Train set: Test |
| image-clustering-on-cifar-100 | ITAE | ARI: 0.5053 Accuracy: 0.6502 Backbone: ViT-B/14 NMI: 0.771 Train Set: Test |
| image-clustering-on-stl-10 | ITAE | ARI: 0.7594 Accuracy: 0.8276 Backbone: ViT-B/14 NMI: 0.8818 Train Split: Test |
| image-clustering-on-tiny-imagenet | ITAE | ARI: 0.5227 Accuracy: 0.6823 NMI: 0.8178 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.