Image Captioning On Nocaps Entire

评估指标

B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
Lyrics----126.8---Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects-
GIT, Single Model88.174.8157.6837.35123.3932.563.1215.94GIT: A Generative Image-to-text Transformer for Vision and Language
CoCa - Google Brain87.0173.7156.8837.71120.5532.2962.5215.47--
Microsoft Cognitive Services team85.6271.3653.6234.65114.2531.2761.214.85VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning-
Prismer84.8769.9952.4833.66110.8431.1360.5514.91Prismer: A Vision-Language Model with Multi-Task Experts
Single Model83.7868.8651.0632.2110.3130.5559.8614.49SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
FudanFVL83.968.7750.8432.17108.2930.6459.8214.72--
FudanWYZ82.9567.4549.5831.38106.8130.3259.1814.56--
IEDA-LAB83.2567.348.4129.2798.0828.9258.5613.9--
firethehole80.7765.5548.1430.297.6130.0758.2514.74--
vll@mk51481.6165.146.1327.3293.4528.4657.414.06--
MD82.4366.2547.1828.293.028.0957.5713.35--
VinVL (Microsoft Cognitive Services + MSR)81.5965.1545.0426.1592.4627.5756.9613.07VinVL: Revisiting Visual Representations in Vision-Language Models
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS81.0364.6245.2626.5287.5627.3656.712.81--
icgp2ssi1_coco_si_0.02_5_test79.061.9542.3624.6287.3426.2955.0312.01--
evertyhing78.9261.641.5223.5286.026.3154.7512.1--
Human76.6456.4636.3719.4885.3428.1552.8314.67--
RCAL78.1960.7439.1121.2482.8825.7253.8512.2--
Oscar79.5760.8338.8321.0280.9325.3354.0711.29--
vinvl_yuan_cbs79.3260.9539.520.379.0425.4453.811.9--
0 of 39 row(s) selected.
Image Captioning On Nocaps Entire | SOTA | HyperAI超神经