Image Retrieval On Coco
评估指标
Recall@10
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| Oscar | 98.3 | Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks | |
| BLIP-2 ViT-G (fine-tuned) | 92.6 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
| VisualSparta | 96.3 | VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words | |
| FLAVA (zero-shot) | - | FLAVA: A Foundational Language And Vision Alignment Model | |
| CLIP (zero-shot) | - | FLAVA: A Foundational Language And Vision Alignment Model | |
| BLIP-2 ViT-L (fine-tuned) | 91.8 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |
0 of 6 row(s) selected.