Image Captioning On Nocaps Entire
评估指标
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Lyrics | - | - | - | - | 126.8 | - | - | - | Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects | - |
| GIT, Single Model | 88.1 | 74.81 | 57.68 | 37.35 | 123.39 | 32.5 | 63.12 | 15.94 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
| CoCa - Google Brain | 87.01 | 73.71 | 56.88 | 37.71 | 120.55 | 32.29 | 62.52 | 15.47 | - | - |
| Microsoft Cognitive Services team | 85.62 | 71.36 | 53.62 | 34.65 | 114.25 | 31.27 | 61.2 | 14.85 | VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning | - |
| Prismer | 84.87 | 69.99 | 52.48 | 33.66 | 110.84 | 31.13 | 60.55 | 14.91 | Prismer: A Vision-Language Model with Multi-Task Experts | |
| Single Model | 83.78 | 68.86 | 51.06 | 32.2 | 110.31 | 30.55 | 59.86 | 14.49 | SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | |
| FudanFVL | 83.9 | 68.77 | 50.84 | 32.17 | 108.29 | 30.64 | 59.82 | 14.72 | - | - |
| FudanWYZ | 82.95 | 67.45 | 49.58 | 31.38 | 106.81 | 30.32 | 59.18 | 14.56 | - | - |
| IEDA-LAB | 83.25 | 67.3 | 48.41 | 29.27 | 98.08 | 28.92 | 58.56 | 13.9 | - | - |
| firethehole | 80.77 | 65.55 | 48.14 | 30.2 | 97.61 | 30.07 | 58.25 | 14.74 | - | - |
| vll@mk514 | 81.61 | 65.1 | 46.13 | 27.32 | 93.45 | 28.46 | 57.4 | 14.06 | - | - |
| MD | 82.43 | 66.25 | 47.18 | 28.2 | 93.0 | 28.09 | 57.57 | 13.35 | - | - |
| VinVL (Microsoft Cognitive Services + MSR) | 81.59 | 65.15 | 45.04 | 26.15 | 92.46 | 27.57 | 56.96 | 13.07 | VinVL: Revisiting Visual Representations in Vision-Language Models | |
| ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 81.03 | 64.62 | 45.26 | 26.52 | 87.56 | 27.36 | 56.7 | 12.81 | - | - |
| icgp2ssi1_coco_si_0.02_5_test | 79.0 | 61.95 | 42.36 | 24.62 | 87.34 | 26.29 | 55.03 | 12.01 | - | - |
| evertyhing | 78.92 | 61.6 | 41.52 | 23.52 | 86.0 | 26.31 | 54.75 | 12.1 | - | - |
| Human | 76.64 | 56.46 | 36.37 | 19.48 | 85.34 | 28.15 | 52.83 | 14.67 | - | - |
| RCAL | 78.19 | 60.74 | 39.11 | 21.24 | 82.88 | 25.72 | 53.85 | 12.2 | - | - |
| Oscar | 79.57 | 60.83 | 38.83 | 21.02 | 80.93 | 25.33 | 54.07 | 11.29 | - | - |
| vinvl_yuan_cbs | 79.32 | 60.95 | 39.5 | 20.3 | 79.04 | 25.44 | 53.8 | 11.9 | - | - |
0 of 39 row(s) selected.