HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Exploring the Limits of Deep Image Clustering using Pretrained Models

Nikolas Adaloglou Felix Michels Hamza Kalisch Markus Kollmann

Exploring the Limits of Deep Image Clustering using Pretrained Models

Abstract

We present a general methodology that learns to classify images without labels by leveraging pretrained feature extractors. Our approach involves self-distillation training of clustering heads based on the fact that nearest neighbours in the pretrained feature space are likely to share the same label. We propose a novel objective that learns associations between image features by introducing a variant of pointwise mutual information together with instance weighting. We demonstrate that the proposed objective is able to attenuate the effect of false positive pairs while efficiently exploiting the structure in the pretrained feature space. As a result, we improve the clustering accuracy over $k$-means on $17$ different pretrained models by $6.1$\% and $12.2$\% on ImageNet and CIFAR100, respectively. Finally, using self-supervised vision transformers, we achieve a clustering accuracy of $61.6$\% on ImageNet. The code is available at https://github.com/HHU-MMBS/TEMI-official-BMVC2023.

Code Repositories

HHU-MMBS/TEMI-official-BMVC2023
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-clustering-on-cifar-10TEMI DINO ViT-B
ARI: 0.885
Accuracy: 0.94.5
Backbone: ViT-B
NMI: 0.886
Train set: Train
image-clustering-on-cifar-10TEMI CLIP ViT-L (openai)
ARI: 0.932
Accuracy: 0.969
Backbone: ViT-L
NMI: 0.926
Train set: Train
image-clustering-on-cifar-100TEMI DINO ViT-B
ARI: 0.533
Accuracy: 0.671
NMI: 0.769
Train Set: Train
image-clustering-on-cifar-100TEMI CLIP ViT-L (openai)
ARI: 0.612
Accuracy: 0.737
NMI: 0.799
Train Set: Train
image-clustering-on-imagenetTEMI DINO (ViT-B)
ARI: 45.9
Accuracy: 58.0
NMI: 81.4
image-clustering-on-imagenetTEMI MSN (ViT-L)
ARI: 48.4
Accuracy: 61.6
NMI: 82.5
image-clustering-on-imagenet-100TEMI CLIP ViT-L (openai)
ACCURACY: 0.8343
ARI: 0.7581
NMI: 0.9006
image-clustering-on-imagenet-100TEMI MSN ViT-L
ACCURACY: 0.8286
ARI: 0.7408
NMI: 0.8853
image-clustering-on-imagenet-100TEMI DINO ViT-B
ACCURACY: 0.7505
ARI: 0.6545
NMI: 0.8565
image-clustering-on-imagenet-200TEMI CLIP ViT-L (openai)-
image-clustering-on-imagenet-200TEMI DINO ViT-B-
image-clustering-on-imagenet-200TEMI MSN ViT-L-
image-clustering-on-imagenet-50-1TEMI DINO ViT-B
ACCURACY: 0.801
ARI: 0.7093
NMI: 0.8610
image-clustering-on-imagenet-50-1TEMI CLIP ViT-L (openai)
ACCURACY: 0.8827
ARI: 0.8272
NMI: 0.9232
image-clustering-on-imagenet-50-1TEMI MSN ViT-L
ACCURACY: 0.8487
ARI: 0.7646
NMI: 0.8814
image-clustering-on-stl-10TEMI DINO ViT-B
ARI: 0.968
Accuracy: 0.985
Backbone: ViT-B
NMI: 0.965
Train Split: Train

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Exploring the Limits of Deep Image Clustering using Pretrained Models | Papers | HyperAI