HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning

Gagan Mundada Yash Vishe Amit Namburi Xin Xu Zachary Novack Julian McAuley Junda Wu

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have demonstratedimpressive capabilities across various vision-language tasks. However, theirreasoning abilities in the multimodal symbolic music domain remain largelyunexplored. We introduce WildScore, the first in-the-wild multimodal symbolicmusic reasoning and analysis benchmark, designed to evaluate MLLMs' capacity tointerpret real-world music scores and answer complex musicological queries.Each instance in WildScore is sourced from genuine musical compositions andaccompanied by authentic user-generated questions and discussions, capturingthe intricacies of practical music analysis. To facilitate systematicevaluation, we propose a systematic taxonomy, comprising both high-level andfine-grained musicological ontologies. Furthermore, we frame complex musicreasoning as multiple-choice question answering, enabling controlled andscalable assessment of MLLMs' symbolic music understanding. Empiricalbenchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patternsin their visual-symbolic reasoning, uncovering both promising directions andpersistent challenges for MLLMs in symbolic music reasoning and analysis. Werelease the dataset and code.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning | Papers | HyperAI