Image Captioning On Flickr30K Captions Test
评估指标
CIDEr
SPICE
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||
|---|---|---|---|---|
| Unified VLP | 67.4 | 17 | Unified Vision-Language Pre-Training for Image Captioning and VQA | |
| KOSMOS-1 1.6B (zero-shot) | 67.1 | 14.5 | - | - |
| Cornia et al | 46.4 | - | Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention | - |
| MetaLM | 43.3 | 11.7 | Language Models are General-Purpose Interfaces | |
| FewVLM | 31.0 | 10.0 | A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models | |
| BRNN | 24.7 | - | Deep Visual-Semantic Alignments for Generating Image Descriptions | |
| VL-T5 | 2.6 | 2.0 | Unifying Vision-and-Language Tasks via Text Generation |
0 of 7 row(s) selected.