HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Dataset Summarization by K Principal Concepts

Niv Cohen Yedid Hoshen

Dataset Summarization by K Principal Concepts

Abstract

We propose the new task of K principal concept identification for dataset summarizarion. The objective is to find a set of K concepts that best explain the variation within the dataset. Concepts are high-level human interpretable terms such as "tiger", "kayaking" or "happy". The K concepts are selected from a (potentially long) input list of candidates, which we denote the concept-bank. The concept-bank may be taken from a generic dictionary or constructed by task-specific prior knowledge. An image-language embedding method (e.g. CLIP) is used to map the images and the concept-bank into a shared feature space. To select the K concepts that best explain the data, we formulate our problem as a K-uncapacitated facility location problem. An efficient optimization technique is used to scale the local search algorithm to very large concept-banks. The output of our method is a set of K principal concepts that summarize the dataset. Our approach provides a more explicit summary in comparison to selecting K representative images, which are often ambiguous. As a further application of our method, the K principal concepts can be used to classify the dataset into K groups. Extensive experiments demonstrate the efficacy of our approach.

Benchmarks

BenchmarkMethodologyMetrics
image-clustering-on-cifar-10Single-Noun Prior
ARI: 0.702
Accuracy: 0.853
Backbone: ViT-B-32
NMI: 0.731
Train set: Train+Test
image-clustering-on-imagenet-100Single-Noun Prior
ACCURACY: 0.731
ARI: 0.628
NMI: 0.805
image-clustering-on-imagenet-200Single-Noun Prior-
image-clustering-on-imagenet-50-1Single-Noun Prior
ACCURACY: 0.827
ARI: 0.744
NMI: 0.847

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Dataset Summarization by K Principal Concepts | Papers | HyperAI