Visual Question Answering On Ok Vqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
PaLI-X-VPD66.8Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models-
PaLM-E-562B66.1PaLM-E: An Embodied Multimodal Language Model
PaLI-X (Single-task FT)66.1PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI 17B64.5PaLI: A Jointly-Scaled Multilingual Language-Image Model
Prophet62.5Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
RA-VQA-v2 (BLIP 2)62.08Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
A Simple Baseline for KB-VQA61.2A Simple Baseline for Knowledge-Based Visual Question Answering-
PromptCap60.4PromptCap: Prompt-Guided Task-Aware Image Captioning
ReVeaL WIT + CC12M + Wikidata + VQA-259.1REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Lyrics58.2Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects-
REVIVE (Ensemble)58.0REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
REVIVE (Single)56.6REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
RA-VQA-v2 (T5-large)54.85Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering
RA-VQA (T5-large)54.48Retrieval Augmented Visual Question Answering with Outside Knowledge
VK-OOD52.4Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis-
VK-OOD52.4Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
RA-VQA-FrDPR (T5-large)51.22Retrieval Augmented Visual Question Answering with Outside Knowledge
Flamingo80B50.6Flamingo: a Visual Language Model for Few-Shot Learning
TRiG (T5-Large)50.50Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering-
HYDRA48.6HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
0 of 37 row(s) selected.
Visual Question Answering On Ok Vqa | SOTA | HyperAI超神经