HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Context-Aware Robust Fine-Tuning

Xiaofeng Mao Yuefeng Chen Xiaojun Jia Rong Zhang Hui Xue Zhao Li

Context-Aware Robust Fine-Tuning

Abstract

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to "[CLASS]" by using similarity between the image and the prompt sentence "a [CONTEXT] of [CLASS]". Based on exhaustive text cues in "[CONTEXT]", CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback-Leibler Divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher In-Distribution (ID) and Out-Of-Distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous Domain Generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

Benchmarks

BenchmarkMethodologyMetrics
domain-generalization-on-domainnetCAR-FT (CLIP, ViT-B/16)
Average Accuracy: 62.5
domain-generalization-on-imagenet-aCAR-FT (CLIP, ViT-L/14@336px)
Top-1 accuracy %: 81.5
domain-generalization-on-imagenet-rCAR-FT (CLIP, ViT-L/14@336px)
Top-1 Error Rate: 10.3
domain-generalization-on-imagenet-sketchCAR-FT (CLIP, ViT-L/14@336px)
Top-1 accuracy: 65.5
domain-generalization-on-office-homeCAR-FT (CLIP, ViT-B/16)
Average Accuracy: 85.7
domain-generalization-on-pacs-2CAR-FT (CLIP, ViT-B/16)
Average Accuracy: 96.8
domain-generalization-on-terraincognitaCAR-FT (CLIP, ViT-B/16)
Average Accuracy: 61.9
domain-generalization-on-vlcsCAR-FT (CLIP, ViT-B/16)
Average Accuracy: 85.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Context-Aware Robust Fine-Tuning | Papers | HyperAI