5 months ago

Unicom: Universal and Compact Representation Learning for Image Retrieval

Xiang An; Jiankang Deng; Kaicheng Yang; Jaiwei Li; Ziyong Feng; Jia Guo; Jing Yang; Tongliang Liu

Abstract

Modern image retrieval methods typically rely on fine-tuning pre-trained encoders to extract image-level descriptors. However, the most widely used models are pre-trained on ImageNet-1K with limited classes. The pre-trained feature representation is therefore not universal enough to generalize well to the diverse open-world classes. In this paper, we first cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model. Due to the confusion of label granularity, the automatically clustered dataset inevitably contains heavy inter-class conflict. To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss. To further enhance the low-dimensional feature representation, we randomly select partial feature dimensions when calculating the similarities between embeddings and class-wise prototypes. The dual random partial selections are with respect to the class dimension and the feature dimension of the prototype matrix, making the classification conflict-robust and the feature embedding compact. Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks. The code and pre-trained models are released to facilitate future research https://github.com/deepglint/unicom.

Code Repositories

OML-Team/open-metric-learning

pytorch

RocketFlash/easy_metric_learning/tree/master/tools

pytorch

deepglint/unicom

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
image-classification-on-imagenet	Unicom (ViT-L/14@336px) (Finetuned)	Top 1 Accuracy: 88.3
image-retrieval-on-google-landmarks-dataset	UNICOM-ViT-B-16-512px	mAP@100: 35.7
image-retrieval-on-google-landmarks-dataset	UNICOM-ViT-L-14-512px	mAP@100: 36.4
image-retrieval-on-google-landmarks-dataset-1	UNICOM-ViT-L-14-512px	mAP@100: 33.1
image-retrieval-on-google-landmarks-dataset-1	UNICOM-ViT-B-16-512px	mAP@100: 32.4
image-retrieval-on-inaturalist	Unicom+ViT-L@336px	R@1: 88.9
image-retrieval-on-sop	Unicom+ViT-L@336px	R@1: 91.2
metric-learning-on-cars196	Unicom+ViT-L@336px	R@1: 98.2
metric-learning-on-cub-200-2011	Unicom+ViT-L@336px	R@1: 90.1
metric-learning-on-in-shop-1	Unicom+ViT-L@336px	R@1: 96.7
metric-learning-on-stanford-online-products-1	Unicom+ViT-L@336px	R@1: 91.2
self-supervised-image-classification-on	Unicom (ViT-B/16)	Number of Params: 80M Top 1 Accuracy: 79.1%
self-supervised-image-classification-on	Unicom (ViT-B/32)	Number of Params: 80M Top 1 Accuracy: 75.0%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Unicom: Universal and Compact Representation Learning for Image Retrieval

Xiang An; Jiankang Deng; Kaicheng Yang; Jaiwei Li; Ziyong Feng; Jia Guo; Jing Yang; Tongliang Liu

Abstract

Code Repositories

Benchmarks

Build AI with AI

Hyper Newsletters