Zhe ChenWeiyun WangYue CaoYangzhou LiuZhangwei GaoErfei CuiJinguo ZhuShenglong YeHao TianZhaoyang LiuLixin GuXuehui WangQingyun LiYimin RenZixuan ChenJiapeng LuoJiahao WangTan JiangBo WangConghui HeBotian ShiXingcheng ZhangHan LvYi WangWenqi ShaoPei ChuZhongying TuTong HeZhiyong WuHuipeng DengJiaye GeKai ChenMin DouLewei LuXizhou ZhuTong LuDahua LinYu QiaoJifeng DaiWenhai Wang

摘要
我们推出InternVL 2.5,这是一个先进的多模态大语言模型(MLLM)系列,基于InternVL 2.0进行演进,在保持其核心模型架构的基础上,显著提升了训练与测试策略以及数据质量。在本研究中,我们系统性地探讨了模型规模与性能之间的关系,深入分析了视觉编码器、语言模型、数据集规模以及测试阶段配置等关键因素的性能趋势。通过在涵盖多学科推理、文档理解、多图像/视频理解、现实世界理解、多模态幻觉检测、视觉定位、多语言能力以及纯语言处理等多个基准上的广泛评估,InternVL 2.5展现出具有竞争力的性能,可与GPT-4o、Claude-3.5-Sonnet等领先商业模型相媲美。尤为突出的是,我们的模型是首个在MMMU基准上突破70%得分的开源多模态大模型,通过引入思维链(Chain-of-Thought, CoT)推理方法,实现了3.7个百分点的性能提升,并展现出强大的测试阶段扩展潜力。我们期望该模型能为开源社区带来价值,推动多模态人工智能系统在开发与应用方面树立新的标准。HuggingFace演示地址:https://huggingface.co/spaces/OpenGVLab/InternVL
代码仓库
opengvlab/internvl
官方
pytorch
基准测试
| 基准 | 方法 | 指标 |
|---|---|---|
| video-question-answering-on-next-qa | InternVL-2.5(8B) | Accuracy: 85.5 |
| visual-question-answering-on-mm-vet | InternVL2.5-78B | GPT-4 score: 72.3 Params: 78B |
| visual-question-answering-on-mm-vet | InternVL2.5-38B | GPT-4 score: 68.8 Params: 38B |
| visual-question-answering-on-mm-vet | InternVL2.5-26B | GPT-4 score: 65.0 Params: 26B |
| visual-question-answering-on-mm-vet | InternVL2.5-2B | GPT-4 score: 60.8 Params: 2B |
| visual-question-answering-on-mm-vet | InternVL2.5-1B | GPT-4 score: 48.8 Params: 1B |
| visual-question-answering-on-mm-vet | InternVL2.5-4B | GPT-4 score: 60.6 Params: 4B |
| visual-question-answering-on-mm-vet | InternVL2.5-8B | GPT-4 score: 62.8 Params: 8B |
| visual-question-answering-vqa-on-vlm2-bench | InternVL2.5-26B | Average Score on VLM2-bench (9 subtasks): 45.59 GC-mat: 30.50 GC-trk: 30.59 OC-cnt: 51.48 OC-cpr: 43.33 OC-grp: 52.50 PC-VID: 21.75 PC-cnt: 59.70 PC-cpr: 59.50 PC-grp: 61.00 |
| visual-question-answering-vqa-on-vlm2-bench | InternVL2.5-8B | Average Score on VLM2-bench (9 subtasks): 41.23 GC-mat: 21.24 GC-trk: 26.03 OC-cnt: 55.23 OC-cpr: 53.33 OC-grp: 46.50 PC-VID: 5.25 PC-cnt: 60.00 PC-cpr: 51.50 PC-grp: 52.00 |