Image Captioning On Nocaps Near Domain

评估指标

B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
GIT2, Single Model88.975.8658.938.95125.5132.9563.6616.11GIT: A Generative Image-to-text Transformer for Vision and Language
GIT, Single Model88.5675.4858.4638.44123.9232.8663.515.96GIT: A Generative Image-to-text Transformer for Vision and Language
PaLI-------15.75PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI88.5775.5658.9939.98124.3533.4763.9915.75PaLI: A Jointly-Scaled Multilingual Language-Image Model
CoCa - Google Brain87.5374.4957.8938.92120.7332.7162.9115.54--
Microsoft Cognitive Services team86.4872.655.2636.31115.5431.861.915.06VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning-
firethehole81.6266.6549.3931.4299.5130.4858.8314.88--
FudanFVL84.4769.6651.9533.46109.3331.0860.3414.79--
Human77.0556.9736.8419.8584.5828.4253.0614.72--
FudanWYZ83.7168.5650.932.72108.0430.7959.814.71--
Single Model84.3669.8352.4233.74110.7630.9760.4614.61SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
vll@mk51482.5566.5547.829.095.6929.1158.2214.37--
IEDA-LAB84.0468.5849.9830.78100.1529.5359.2314.15--
MD83.5867.9949.2929.9695.7328.8458.4713.64--
VinVL (Microsoft Cognitive Services + MSR)82.7766.9447.0227.9795.1628.2457.9513.36VinVL: Revisiting Visual Representations in Vision-Language Models
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS81.9365.8846.7227.9489.8727.8957.3412.98--
RCAL79.2162.2640.7722.5684.026.354.6212.47--
evertyhing79.6762.7342.8724.885.8926.6855.3712.24--
camel XE79.2162.0642.5125.0679.1426.8755.2412.14--
vinvl_yuan_cbs80.2462.3141.0721.5380.2125.9854.5212.12--
0 of 40 row(s) selected.
Image Captioning On Nocaps Near Domain | SOTA | HyperAI超神经