Command Palette
Search for a command to run...
Visual Question Answering (VQA)
Visual Question Answering (VQA) is a task in the field of computer vision that aims to answer questions about images using natural language. The core objective of this task is to enable machines to understand the content of images and provide answers in an accurate and coherent linguistic form. VQA has significant application value in human-computer interaction, intelligent assistance, and content understanding, significantly enhancing the visual cognitive abilities of machines.
GQA Test2019
VQA v2 test-dev
Oscar
VQA v2 test-std
BEiT-3
OK-VQA
MetaLM
MSVD-QA
HCRN
MSRVTT-QA
HCRN
DocVQA test
Human
InfographicVQA
Gemini Ultra (pixel only)
GQA test-dev
CFR
VizWiz 2020 VQA
CLEVR
NS-VQA (1K programs)
A-OKVQA
InfiMM-Eval
GPT-4V
COCO Visual Question Answering (VQA) real images 1.0 open ended
IconQA
Patch-TRM
TextVQA test-standard
PaLI
VQA v2 val
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
VCR (Q-A) test
VizWiz 2018
LXR955, No Ensemble
VQA-CP
CSS
COCO Visual Question Answering (VQA) real images 1.0 multiple choice
MCB 7 att.
VQA-CE
RandImg
VLM2-Bench
VCR (QA-R) test
UNITER (Large)
InfoSeek
VQA v1 test-dev
SAAA (ResNet)
VCR (Q-AR) test
GPT4RoI
IllusionVQA
GQA test-std
ProTo
VQA v1 test-std
SAAA (ResNet)
WHOOPS!
VizWiz 2020 Answerability
QLEVR
MAC
AutoHallusion
GPT-4V
CLEVR-Humans
MDETR
PMC-VQA
PlotQA-D1
COCO Visual Question Answering (VQA) real images 2.0 open ended
HDU-USYD-UNCC
COCO Visual Question Answering (VQA) abstract images 1.0 open ended
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice
AI2D
Visual7W
CMN
HallusionBench
GPT-4V
PlotQA-D2
VCR (QA-R) dev
VL-BERTLARGE
VCR (Q-AR) dev
VL-BERTLARGE
F-VQA
ZS-F-VQA
FigureQA - test 1
PReFIL
VCR (Q-A) dev
VL-BERTLARGE
GRIT
DocVQA val
BERT LARGE Baseline
TGIF-QA
TDIUC
Accuracy
GQA
PEVL+
VQA-X
RetVQA
MI-BART
Visual Genome (pairs)
CMN
ArtQuest
PrefixLM with CLIP and T5
OVAD benchmark
EgoSchema
Lyra-Pro
COCO
MME
ActivityNet
BLIP-2 T5
Visual Genome (subjects)
Video MME
CORE-MM
MM-Vet
DVQA test-familiar
PReFIL (Oracle OCR)
DeepForm
MVBench
WebSRC
ZS-F-VQA
SAN † - hard mask
VizWiz 2018 Answerability
DocVQA
TextVQA
ImageNet