Zero Shot Transfer Image Classification On 3
评估指标
Accuracy (Private)
Accuracy (Public)
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||
|---|---|---|---|---|
| BASIC (Lion) | 81.2 | - | - | - |
| LiT-22B | 80.9 | - | Scaling Vision Transformers to 22 Billion Parameters | |
| CoCa | 80.7 | - | CoCa: Contrastive Captioners are Image-Text Foundation Models | |
| LiT ViT-e | 80.6 | - | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
| BASIC | 80.6 | - | Combined Scaling for Zero-shot Transfer Learning | - |
| LiT-tuning | 78.7 | 66.6 | LiT: Zero-Shot Transfer with Locked-image text Tuning | |
| EVA-CLIP-18B | 77.9 | - | EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | |
| InternVL-C | 77.3 | - | InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | |
| EVA-CLIP-E/14+ | 75.7 | - | EVA-CLIP: Improved Training Techniques for CLIP at Scale | |
| ALIGN | 70.1 | - | Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision | |
| CLIP | 70.1 | - | Learning Transferable Visual Models From Natural Language Supervision | |
| AltCLIP | 68.1 | - | AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | |
| PaLI | 64.46 | - | PaLI: A Jointly-Scaled Multilingual Language-Image Model |
0 of 13 row(s) selected.