Image Captioning On Nocaps In Domain
评估指标
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| PaLI | - | - | - | - | 149.1 | - | - | - | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
| GIT2, Single Model | 88.86 | 75.86 | 59.94 | 41.1 | 124.18 | 33.83 | 63.82 | 16.36 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
| GIT, Single Model | 88.55 | 76.1 | 60.53 | 41.65 | 122.4 | 33.41 | 64.02 | 16.18 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
| PaLI | 88.02 | 75.21 | 59.38 | 41.16 | 121.09 | 34.22 | 64.39 | 15.69 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
| CoCa - Google Brain | 87.27 | 74.29 | 58.01 | 39.24 | 117.9 | 33.01 | 63.12 | 15.49 | - | - |
| Microsoft Cognitive Services team | 86.33 | 72.83 | 55.94 | 37.97 | 112.82 | 32.7 | 62.48 | 15.22 | VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning | - |
| Single Model | 84.64 | 70.0 | 52.96 | 34.66 | 108.98 | 31.97 | 61.01 | 14.6 | SimVLM: Simple Visual Language Model Pretraining with Weak Supervision | |
| GRIT (zero-shot, no VL pretraining, no CBS) | - | - | - | - | 105.9 | - | - | 13.6 | GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features | |
| FudanFVL | 84.2 | 69.57 | 52.56 | 34.8 | 104.9 | 31.77 | 60.52 | 15.04 | - | - |
| FudanWYZ | 82.91 | 68.02 | 50.75 | 33.59 | 104.25 | 31.33 | 59.67 | 14.85 | - | - |
| IEDA-LAB | 84.4 | 69.8 | 51.89 | 32.86 | 102.64 | 30.43 | 60.07 | 14.47 | - | - |
| vll@mk514 | 83.77 | 68.7 | 51.26 | 32.76 | 101.69 | 30.51 | 59.75 | 14.99 | - | - |
| MD | 84.03 | 69.12 | 51.16 | 33.15 | 100.03 | 30.06 | 59.67 | 14.08 | - | - |
| firethehole | 81.86 | 67.2 | 50.5 | 34.11 | 99.9 | 31.61 | 59.54 | 15.17 | - | - |
| VinVL (Microsoft Cognitive Services + MSR) | 83.24 | 68.04 | 49.68 | 30.62 | 97.99 | 29.51 | 58.54 | 13.63 | VinVL: Revisiting Visual Representations in Vision-Language Models | |
| ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 82.9 | 68.09 | 49.73 | 31.24 | 96.63 | 29.37 | 58.62 | 13.61 | - | - |
| camel XE | 80.5 | 64.48 | 46.46 | 29.59 | 88.08 | 28.7 | 56.84 | 13.04 | - | - |
| evertyhing | 79.58 | 63.09 | 43.92 | 26.07 | 87.86 | 27.97 | 55.88 | 12.6 | - | - |
| RCAL | 80.68 | 64.7 | 45.33 | 27.09 | 87.28 | 27.7 | 56.76 | 12.79 | - | - |
| icgp2ssi1_coco_si_0.02_5_test | 80.26 | 63.94 | 44.65 | 27.23 | 87.21 | 27.7 | 56.4 | 12.28 | - | - |
0 of 41 row(s) selected.