Image Captioning On Nocaps In Domain

评估指标

B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
PaLI----149.1---PaLI: A Jointly-Scaled Multilingual Language-Image Model
GIT2, Single Model88.8675.8659.9441.1124.1833.8363.8216.36GIT: A Generative Image-to-text Transformer for Vision and Language
GIT, Single Model88.5576.160.5341.65122.433.4164.0216.18GIT: A Generative Image-to-text Transformer for Vision and Language
PaLI88.0275.2159.3841.16121.0934.2264.3915.69PaLI: A Jointly-Scaled Multilingual Language-Image Model
CoCa - Google Brain87.2774.2958.0139.24117.933.0163.1215.49--
Microsoft Cognitive Services team86.3372.8355.9437.97112.8232.762.4815.22VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning-
Single Model84.6470.052.9634.66108.9831.9761.0114.6SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
GRIT (zero-shot, no VL pretraining, no CBS)----105.9--13.6GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
FudanFVL84.269.5752.5634.8104.931.7760.5215.04--
FudanWYZ82.9168.0250.7533.59104.2531.3359.6714.85--
IEDA-LAB84.469.851.8932.86102.6430.4360.0714.47--
vll@mk51483.7768.751.2632.76101.6930.5159.7514.99--
MD84.0369.1251.1633.15100.0330.0659.6714.08--
firethehole81.8667.250.534.1199.931.6159.5415.17--
VinVL (Microsoft Cognitive Services + MSR)83.2468.0449.6830.6297.9929.5158.5413.63VinVL: Revisiting Visual Representations in Vision-Language Models
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS82.968.0949.7331.2496.6329.3758.6213.61--
camel XE80.564.4846.4629.5988.0828.756.8413.04--
evertyhing79.5863.0943.9226.0787.8627.9755.8812.6--
RCAL80.6864.745.3327.0987.2827.756.7612.79--
icgp2ssi1_coco_si_0.02_5_test80.2663.9444.6527.2387.2127.756.412.28--
0 of 41 row(s) selected.
Image Captioning On Nocaps In Domain | SOTA | HyperAI超神经