3 months ago

Context-Aware Robust Fine-Tuning

Xiaofeng Mao Yuefeng Chen Xiaojun Jia Rong Zhang Hui Xue Zhao Li

Abstract

Contrastive Language-Image Pre-trained (CLIP) models have zero-shot ability of classifying an image belonging to "[CLASS]" by using similarity between the image and the prompt sentence "a [CONTEXT] of [CLASS]". Based on exhaustive text cues in "[CONTEXT]", CLIP model is aware of different contexts, e.g. background, style, viewpoint, and exhibits unprecedented robustness against a wide range of distribution shifts. However, recent works find further fine-tuning of CLIP models improves accuracy but sacrifices the robustness on downstream tasks. We conduct an empirical investigation to show fine-tuning will corrupt the context-aware ability of pre-trained CLIP features. To solve this problem, we propose Context-Aware Robust Fine-tuning (CAR-FT). CAR-FT regularizes the model during fine-tuning to capture the context information. Specifically, we use zero-shot prompt weights to get the context distribution contained in the image. By minimizing the Kullback-Leibler Divergence (KLD) between context distributions induced by original/fine-tuned CLIP models, CAR-FT makes the context-aware ability of CLIP inherited into downstream tasks, and achieves both higher In-Distribution (ID) and Out-Of-Distribution (OOD) accuracy. The experimental results show CAR-FT achieves superior robustness on five OOD test datasets of ImageNet, and meanwhile brings accuracy gains on nine downstream tasks. Additionally, CAR-FT surpasses previous Domain Generalization (DG) methods and gets 78.5% averaged accuracy on DomainBed benchmark, building the new state-of-the-art.

Benchmarks

Benchmark	Methodology	Metrics
domain-generalization-on-domainnet	CAR-FT (CLIP, ViT-B/16)	Average Accuracy: 62.5
domain-generalization-on-imagenet-a	CAR-FT (CLIP, ViT-L/14@336px)	Top-1 accuracy %: 81.5
domain-generalization-on-imagenet-r	CAR-FT (CLIP, ViT-L/14@336px)	Top-1 Error Rate: 10.3
domain-generalization-on-imagenet-sketch	CAR-FT (CLIP, ViT-L/14@336px)	Top-1 accuracy: 65.5
domain-generalization-on-office-home	CAR-FT (CLIP, ViT-B/16)	Average Accuracy: 85.7
domain-generalization-on-pacs-2	CAR-FT (CLIP, ViT-B/16)	Average Accuracy: 96.8
domain-generalization-on-terraincognita	CAR-FT (CLIP, ViT-B/16)	Average Accuracy: 61.9
domain-generalization-on-vlcs	CAR-FT (CLIP, ViT-B/16)	Average Accuracy: 85.5

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning