Visual Question Answering On Vcr Qa R Test
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| GPT4RoI | 91.0 | GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest | |
| ERNIE-ViL-large(ensemble of 15 models) | 86.1 | ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph | - |
| UNITER-large (ensemble of 10 models) | 83.4 | UNITER: UNiversal Image-TExt Representation Learning | |
| UNITER (Large) | 80.8 | UNITER: UNiversal Image-TExt Representation Learning | |
| KVL-BERTLARGE | 78.6 | KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning | - |
| VL-BERTLARGE | 78.4 | VL-BERT: Pre-training of Generic Visual-Linguistic Representations | |
| VL-T5 | 77.8 | Unifying Vision-and-Language Tasks via Text Generation | |
| VisualBERT | 73.2 | VisualBERT: A Simple and Performant Baseline for Vision and Language |
0 of 8 row(s) selected.