HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

ImageBind: One Embedding Space To Bind Them All

Rohit Girdhar; Alaaeldin El-Nouby; Zhuang Liu; Mannat Singh; Kalyan Vasudev Alwala; Armand Joulin; Ishan Misra

ImageBind: One Embedding Space To Bind Them All

Abstract

We present ImageBind, an approach to learn a joint embedding across six different modalities - images, text, audio, depth, thermal, and IMU data. We show that all combinations of paired data are not necessary to train such a joint embedding, and only image-paired data is sufficient to bind the modalities together. ImageBind can leverage recent large scale vision-language models, and extends their zero-shot capabilities to new modalities just by using their natural pairing with images. It enables novel emergent applications 'out-of-the-box' including cross-modal retrieval, composing modalities with arithmetic, cross-modal detection and generation. The emergent capabilities improve with the strength of the image encoder and we set a new state-of-the-art on emergent zero-shot recognition tasks across modalities, outperforming specialist supervised models. Finally, we show strong few-shot recognition results outperforming prior work, and that ImageBind serves as a new way to evaluate vision models for visual and non-visual tasks.

Code Repositories

klemens-floege/oneprot
pytorch
Mentioned in GitHub
ginihumer/amumo
jax
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sound-prompted-semantic-segmentation-onImageBIND
mAP: 19.7
mIoU: 20.5
speech-prompted-semantic-segmentation-onImageBIND
mAP: 20.2
mIoU: 19.7
temporal-relation-extraction-on-vinogroundImageBind
Group Score: 0.6
Text Score: 9.4
Video Score: 3.4
zero-shot-video-retrieval-on-msr-vttImageBind
text-to-video R@1: 36.8
text-to-video R@10: 70.0
text-to-video R@5: 61.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ImageBind: One Embedding Space To Bind Them All | Papers | HyperAI