Visual Instruction Following On Llava Bench
评估指标
avg score
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| CuMo-7B | 85.7 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | |
| ShareGPT4V-13B | 79.9 | ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | |
| ShareGPT4V-7B | 72.6 | ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | |
| LLaVA-v1.5-13B | 70.7 | Improved Baselines with Visual Instruction Tuning | |
| LLaVA-v1.5-7B | 63.4 | Improved Baselines with Visual Instruction Tuning | |
| InstructBLIP-7B | 60.9 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
| InstructBLIP-13B | 58.2 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
| BLIP-2 | 38.1 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models |
0 of 8 row(s) selected.