HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning
  Dataset

Abstract

In this paper, we introduce BMMR, a large-scale bilingual, multimodal,multi-disciplinary reasoning dataset for the community to develop and evaluatelarge multimodal models (LMMs). BMMR comprises 110k college-level questionsspanning 300 UNESCO-defined subjects, spanning diverse formats-multiple-choice,fill-in-the-blank, and open-ended QA-and sourced from both print and digitalmedia such as books, exams, and quizzes. All data are curated and filtered viaa human-in-the-loop and scalable framework, and each instance is paired with ahigh-quality reasoning path. The dataset is organized into two parts: BMMR-Evalthat comprises 20,458 high-quality instances to comprehensively assess LMMs'knowledge and reasoning across multiple disciplines in both Chinese andEnglish; and BMMR-Train that contains 88,991 instances to support furtherresearch and development, extending the current focus on mathematical reasoningto diverse disciplines and domains. In addition, we propose the process-basedmulti-discipline verifier (i.e., BMMR-Verifier) for accurate and fine-grainedevaluation of reasoning paths. Extensive experiments on 24 models reveal that(i) even SOTA models (e.g., o3 and Gemini-2.5-Pro) leave substantial headroomon BMMR-Eval; (ii) reasoning models exhibit discipline bias and outperform LMMsonly on specific subjects; (iii) open-source models still trail theirproprietary counterparts; and (iv) fine-tuning on BMMR-Train narrows this gap.Additionally, we conduct reasoning-chain analyses using BMMR-Verifier and otherin-depth studies, uncovering the challenges LMMs currently face inmultidisciplinary reasoning. We will release the data, and we hope our work canoffer insights and contributions to the community.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset | Papers | HyperAI