5 months ago

Visual Question Answering

Method/Architecture

Gagan Mundada Yash Vishe Amit Namburi Xin Xu Zachary Novack Julian McAuley Junda Wu

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have demonstratedimpressive capabilities across various vision-language tasks. However, theirreasoning abilities in the multimodal symbolic music domain remain largelyunexplored. We introduce WildScore, the first in-the-wild multimodal symbolicmusic reasoning and analysis benchmark, designed to evaluate MLLMs' capacity tointerpret real-world music scores and answer complex musicological queries.Each instance in WildScore is sourced from genuine musical compositions andaccompanied by authentic user-generated questions and discussions, capturingthe intricacies of practical music analysis. To facilitate systematicevaluation, we propose a systematic taxonomy, comprising both high-level andfine-grained musicological ontologies. Furthermore, we frame complex musicreasoning as multiple-choice question answering, enabling controlled andscalable assessment of MLLMs' symbolic music understanding. Empiricalbenchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patternsin their visual-symbolic reasoning, uncovering both promising directions andpersistent challenges for MLLMs in symbolic music reasoning and analysis. Werelease the dataset and code.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

5 months ago

Visual Question Answering

Method/Architecture

Gagan Mundada Yash Vishe Amit Namburi Xin Xu Zachary Novack Julian McAuley Junda Wu

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have demonstratedimpressive capabilities across various vision-language tasks. However, theirreasoning abilities in the multimodal symbolic music domain remain largelyunexplored. We introduce WildScore, the first in-the-wild multimodal symbolicmusic reasoning and analysis benchmark, designed to evaluate MLLMs' capacity tointerpret real-world music scores and answer complex musicological queries.Each instance in WildScore is sourced from genuine musical compositions andaccompanied by authentic user-generated questions and discussions, capturingthe intricacies of practical music analysis. To facilitate systematicevaluation, we propose a systematic taxonomy, comprising both high-level andfine-grained musicological ontologies. Furthermore, we frame complex musicreasoning as multiple-choice question answering, enabling controlled andscalable assessment of MLLMs' symbolic music understanding. Empiricalbenchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patternsin their visual-symbolic reasoning, uncovering both promising directions andpersistent challenges for MLLMs in symbolic music reasoning and analysis. Werelease the dataset and code.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning | Papers | HyperAI