Command Palette
Search for a command to run...
Gagan Mundada Yash Vishe Amit Namburi Xin Xu Zachary Novack Julian McAuley Junda Wu

Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have demonstratedimpressive capabilities across various vision-language tasks. However, theirreasoning abilities in the multimodal symbolic music domain remain largelyunexplored. We introduce WildScore, the first in-the-wild multimodal symbolicmusic reasoning and analysis benchmark, designed to evaluate MLLMs' capacity tointerpret real-world music scores and answer complex musicological queries.Each instance in WildScore is sourced from genuine musical compositions andaccompanied by authentic user-generated questions and discussions, capturingthe intricacies of practical music analysis. To facilitate systematicevaluation, we propose a systematic taxonomy, comprising both high-level andfine-grained musicological ontologies. Furthermore, we frame complex musicreasoning as multiple-choice question answering, enabling controlled andscalable assessment of MLLMs' symbolic music understanding. Empiricalbenchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patternsin their visual-symbolic reasoning, uncovering both promising directions andpersistent challenges for MLLMs in symbolic music reasoning and analysis. Werelease the dataset and code.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.