Visual Question Answering On Benchlmm
评估指标
GPT-3.5 score
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| GPT-4V | 58.37 | GPT-4 Technical Report | |
| Sphinx-V2-1K | 57.43 | SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | |
| LLaVA-1.5-13B | 55.53 | Improved Baselines with Visual Instruction Tuning | |
| LLaVA-1.5-7B | 46.83 | Visual Instruction Tuning | |
| InstructBLIP-13B | 45.03 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
| InstructBLIP-7B | 44.63 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
| LLaVA-1-13B | 43.50 | Visual Instruction Tuning | |
| Otter-7B | 39.13 | Otter: A Multi-Modal Model with In-Context Instruction Tuning | |
| MiniGPT4-13B | 34.93 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | |
| MiniGPTv2-7B | 30.1 | MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | 
0 of 10 row(s) selected.