HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

GLAMI-1M: A Multilingual Image-Text Fashion Dataset

Vaclav Kosar; Antonín Hoskovec; Milan Šulc; Radek Bartyzal

GLAMI-1M: A Multilingual Image-Text Fashion Dataset

Abstract

We introduce GLAMI-1M: the largest multilingual image-text classification dataset and benchmark. The dataset contains images of fashion products with item descriptions, each in 1 of 13 languages. Categorization into 191 classes has high-quality annotations: all 100k images in the test set and 75% of the 1M training set were human-labeled. The paper presents baselines for image-text classification showing that the dataset presents a challenging fine-grained classification problem: The best scoring EmbraceNet model using both visual and textual features achieves 69.7% accuracy. Experiments with a modified Imagen model show the dataset is also suitable for image generation conditioned on text. The dataset, source code and model checkpoints are published at https://github.com/glami/glami-1m

Code Repositories

glami/glami-1m
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
multi-lingual-image-text-classification-onEmbraceNet (image+text)
Top 1 Accuracy %: 69.7
Top 5 Accuracy %: 94.0
multi-lingual-image-text-classification-onCLIP (zero-shot image+text)
Top 1 Accuracy %: 32.3
Top 5 Accuracy %: 74.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GLAMI-1M: A Multilingual Image-Text Fashion Dataset | Papers | HyperAI