HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Sigmoid Loss for Language Image Pre-Training

Xiaohua Zhai Basil Mustafa Alexander Kolesnikov Lucas Beyer

Sigmoid Loss for Language Image Pre-Training

Abstract

We propose a simple pairwise Sigmoid loss for Language-Image Pre-training (SigLIP). Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs and does not require a global view of the pairwise similarities for normalization. The sigmoid loss simultaneously allows further scaling up the batch size, while also performing better at smaller batch sizes. Combined with Locked-image Tuning, with only four TPUv4 chips, we train a SigLiT model that achieves 84.5% ImageNet zero-shot accuracy in two days. The disentanglement of the batch size from the loss further allows us to study the impact of examples vs pairs and negative to positive ratio. Finally, we push the batch size to the extreme, up to one million, and find that the benefits of growing batch size quickly diminish, with a more reasonable batch size of 32k being sufficient. We release our models at https://github.com/google-research/big_vision and hope our research motivates further explorations in improving the quality and efficiency of language-image pre-training.

Code Repositories

mlfoundations/open_clip
pytorch
Mentioned in GitHub
ramanakshay/clip
pytorch
Mentioned in GitHub
google-research/big_vision
Official
jax
Mentioned in GitHub
huggingface/transformers
pytorch
Mentioned in GitHub
filipbasara0/relic
pytorch
Mentioned in GitHub
merveenoyan/siglip
pytorch
Mentioned in GitHub
filipbasara0/simple-clip
pytorch
Mentioned in GitHub
morrisfl/unifex
pytorch
Mentioned in GitHub
borisdayma/clip-jax
jax
Mentioned in GitHub
apple/ml-mobileclip
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
image-to-text-retrieval-on-cocoSigLIP (ViT-L, zero-shot)
Recall@1: 70.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Sigmoid Loss for Language Image Pre-Training | Papers | HyperAI