Image Captioning On Nocaps Near Domain
评估指标
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GIT2, Single Model | 88.9 | 75.86 | 58.9 | 38.95 | 125.51 | 32.95 | 63.66 | 16.11 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
| GIT, Single Model | 88.56 | 75.48 | 58.46 | 38.44 | 123.92 | 32.86 | 63.5 | 15.96 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
| PaLI | - | - | - | - | - | - | - | 15.75 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
| PaLI | 88.57 | 75.56 | 58.99 | 39.98 | 124.35 | 33.47 | 63.99 | 15.75 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
| CoCa - Google Brain | 87.53 | 74.49 | 57.89 | 38.92 | 120.73 | 32.71 | 62.91 | 15.54 | - | - |
| Microsoft Cognitive Services team | 86.48 | 72.6 | 55.26 | 36.31 | 115.54 | 31.8 | 61.9 | 15.06 | VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning | - |
| firethehole | 81.62 | 66.65 | 49.39 | 31.42 | 99.51 | 30.48 | 58.83 | 14.88 | - | - |
| FudanFVL | 84.47 | 69.66 | 51.95 | 33.46 | 109.33 | 31.08 | 60.34 | 14.79 | - | - |
| Human | 77.05 | 56.97 | 36.84 | 19.85 | 84.58 | 28.42 | 53.06 | 14.72 | - | - |
| FudanWYZ | 83.71 | 68.56 | 50.9 | 32.72 | 108.04 | 30.79 | 59.8 | 14.71 | - | - |
| Single Model | 84.36 | 69.83 | 52.42 | 33.74 | 110.76 | 30.97 | 60.46 | 14.61 | SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | |
| vll@mk514 | 82.55 | 66.55 | 47.8 | 29.0 | 95.69 | 29.11 | 58.22 | 14.37 | - | - |
| IEDA-LAB | 84.04 | 68.58 | 49.98 | 30.78 | 100.15 | 29.53 | 59.23 | 14.15 | - | - |
| MD | 83.58 | 67.99 | 49.29 | 29.96 | 95.73 | 28.84 | 58.47 | 13.64 | - | - |
| VinVL (Microsoft Cognitive Services + MSR) | 82.77 | 66.94 | 47.02 | 27.97 | 95.16 | 28.24 | 57.95 | 13.36 | VinVL: Revisiting Visual Representations in Vision-Language Models | |
| ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 81.93 | 65.88 | 46.72 | 27.94 | 89.87 | 27.89 | 57.34 | 12.98 | - | - |
| RCAL | 79.21 | 62.26 | 40.77 | 22.56 | 84.0 | 26.3 | 54.62 | 12.47 | - | - |
| evertyhing | 79.67 | 62.73 | 42.87 | 24.8 | 85.89 | 26.68 | 55.37 | 12.24 | - | - |
| camel XE | 79.21 | 62.06 | 42.51 | 25.06 | 79.14 | 26.87 | 55.24 | 12.14 | - | - |
| vinvl_yuan_cbs | 80.24 | 62.31 | 41.07 | 21.53 | 80.21 | 25.98 | 54.52 | 12.12 | - | - |
0 of 40 row(s) selected.