HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
视觉问答 (VQA)
Visual Question Answering On Vqa V2 Val
Visual Question Answering On Vqa V2 Val
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
65.2
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
PNP-VQA
63.3
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
BLIP-2 ViT-G FlanT5 XL (zero-shot)
63.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-L FlanT5 XL (zero-shot)
62.6
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 6.7B (zero-shot)
54.3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 2.7B (zero-shot)
53.5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-L OPT 2.7B (zero-shot)
50.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Few VLM (zero-shot)
47.7
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
MetaLM
41.1
Language Models are General-Purpose Interfaces
VLKD(ViT-B/16)
38.6
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
-
Frozen
29.5
Multimodal Few-Shot Learning with Frozen Language Models
-
0 of 11 row(s) selected.
Previous
Next
Visual Question Answering On Vqa V2 Val | SOTA | HyperAI超神经