Image Captioning On Nocaps Out Of Domain

评估指标

CIDEr
SPICE

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
PaLI126.6715.49PaLI: A Jointly-Scaled Multilingual Language-Image Model
GIT2, Single Model122.2715.62GIT: A Generative Image-to-text Transformer for Vision and Language
GIT, Single Model122.0415.7GIT: A Generative Image-to-text Transformer for Vision and Language
CoCa - Google Brain121.6915.13--
Microsoft Cognitive Services team110.1413.74VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning-
Single Model109.4913.89SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
FudanFVL106.5514.21--
FudanWYZ103.7513.75--
Human91.6214.21--
firethehole88.5413.87--
IEDA-LAB87.5112.52--
icgp2ssi1_coco_si_0.02_5_test87.1511.43--
evertyhing85.1811.18--
vll@mk51478.9112.14--
VinVL (Microsoft Cognitive Services + MSR)78.0111.48VinVL: Revisiting Visual Representations in Vision-Language Models
MD77.3911.59--
RCAL75.3910.68--
Oscar73.759.72--
GRIT (zero-shot, no CBS, no VL pretraining, single model)72.611.1GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS72.1311.53--
0 of 40 row(s) selected.
Image Captioning On Nocaps Out Of Domain | SOTA | HyperAI超神经