HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Scientists' First Exam: Probing Cognitive Abilities of MLLM via
  Perception, Understanding, and Reasoning

Abstract

Scientific discoveries increasingly rely on complex multimodal reasoningbased on information-intensive scientific data and domain-specific expertise.Empowered by expert-level scientific benchmarks, scientific Multimodal LargeLanguage Models (MLLMs) hold the potential to significantly enhance thisdiscovery process in realistic workflows. However, current scientificbenchmarks mostly focus on evaluating the knowledge understanding capabilitiesof MLLMs, leading to an inadequate assessment of their perception and reasoningabilities. To address this gap, we present the Scientists' First Exam (SFE)benchmark, designed to evaluate the scientific cognitive capacities of MLLMsthrough three interconnected levels: scientific signal perception, scientificattribute understanding, scientific comparative reasoning. Specifically, SFEcomprises 830 expert-verified VQA pairs across three question types, spanning66 multimodal tasks across five high-value disciplines. Extensive experimentsreveal that current state-of-the-art GPT-o3 and InternVL-3 achieve only 34.08%and 26.52% on SFE, highlighting significant room for MLLMs to improve inscientific realms. We hope the insights obtained in SFE will facilitate furtherdevelopments in AI-enhanced scientific discoveries.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning | Papers | HyperAI