HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Learning to Prompt for Vision-Language Models

Zhou Kaiyang ; Yang Jingkang ; Loy Chen Change ; Liu Ziwei

Learning to Prompt for Vision-Language Models

Abstract

Large pre-trained vision-language models like CLIP have shown great potentialin learning representations that are transferable across a wide range ofdownstream tasks. Different from the traditional representation learning thatis based mostly on discretized labels, vision-language pre-training alignsimages and texts in a common feature space, which allows zero-shot transfer toa downstream task via prompting, i.e., classification weights are synthesizedfrom natural language describing classes of interest. In this work, we showthat a major challenge for deploying such models in practice is promptengineering, which requires domain expertise and is extremely time-consuming --one needs to spend a significant amount of time on words tuning since a slightchange in wording could have a huge impact on performance. Inspired by recentadvances in prompt learning research in natural language processing (NLP), wepropose Context Optimization (CoOp), a simple approach specifically foradapting CLIP-like vision-language models for downstream image recognition.Concretely, CoOp models a prompt's context words with learnable vectors whilethe entire pre-trained parameters are kept fixed. To handle different imagerecognition tasks, we provide two implementations of CoOp: unified context andclass-specific context. Through extensive experiments on 11 datasets, wedemonstrate that CoOp requires as few as one or two shots to beat hand-craftedprompts with a decent margin and is able to gain significant improvements overprompt engineering with more shots, e.g., with 16 shots the average gain isaround 15% (with the highest reaching over 45%). Despite being a learning-basedapproach, CoOp achieves superb domain generalization performance compared withthe zero-shot model using hand-crafted prompts.

Code Repositories

vill-lab/2024-aaai-hpt
pytorch
Mentioned in GitHub
kenomo/industrial-clip
pytorch
Mentioned in GitHub
hhenryd/tap
pytorch
Mentioned in GitHub
ThomasWangY/2024-AAAI-HPT
pytorch
Mentioned in GitHub
ArsenalCheng/Meta-Adapter
pytorch
Mentioned in GitHub
saic-fi/bayesian-prompt-learning
pytorch
Mentioned in GitHub
kaiyangzhou/coop
Official
pytorch
Mentioned in GitHub
farinamatteo/zero
pytorch
Mentioned in GitHub
Gahyeonkim09/AAPL
pytorch
Mentioned in GitHub
kaiyangzhou/on-device-dg
pytorch
Mentioned in GitHub
srvcodes/clap4clip
pytorch
Mentioned in GitHub
muzairkhattak/protext
pytorch
Mentioned in GitHub
healthx-lab/biomedcoop
pytorch
Mentioned in GitHub
YangYongJin/APEX
pytorch
Mentioned in GitHub
mlvlab/dapt
pytorch
Mentioned in GitHub
Vill-Lab/2024-TIP-MetaPrompt
pytorch
Mentioned in GitHub
azshue/TPT
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
few-shot-age-estimation-on-morph-album2CoOp
MAE: 5.09
MAE (16 shot): 3.23
MAE (2 shot): 4.50
MAE (4 shot): 3.81
MAE (8 shot): 3.57

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp