HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined
  Levels

Abstract

The explosion of visual content available online underscores the requirementfor an accurate machine assessor to robustly evaluate scores across diversetypes of visual contents. While recent studies have demonstrated theexceptional potentials of large multi-modality models (LMMs) on a wide range ofrelated fields, in this work, we explore how to teach them for visual ratingaligned with human opinions. Observing that human raters only learn and judgediscrete text-defined levels in subjective studies, we propose to emulate thissubjective process and teach LMMs with text-defined rating levels instead ofscores. The proposed Q-Align achieves state-of-the-art performance on imagequality assessment (IQA), image aesthetic assessment (IAA), as well as videoquality assessment (VQA) tasks under the original LMM structure. With thesyllabus, we further unify the three tasks into one model, termed the OneAlign.In our experiments, we demonstrate the advantage of the discrete-level-basedsyllabus over direct-score-based variants for LMMs. Our code and thepre-trained weights are released at https://github.com/Q-Future/Q-Align.

Code Repositories

q-future/q-align
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
aesthetics-quality-assessment-on-aestheticOneAlign
SRCC: 0.823
image-quality-assessment-on-koniq-10kOneAlign
PLCC: 0.952
SRCC: 0.941
video-quality-assessment-on-live-fb-lsvqOneAlign
PLCC: 0.886
video-quality-assessment-on-live-fb-lsvqOneAlign + FAST-VQA
PLCC: 0.900
video-quality-assessment-on-msu-sr-qa-datasetQ-Align (IQA)
KLCC: 0.61677
PLCC: 0.74116
SROCC: 0.75088
Type: NR
video-quality-assessment-on-msu-sr-qa-datasetQ-Align (IAA)
KLCC: 0.42211
PLCC: 0.50055
SROCC: 0.51521
Type: NR
video-quality-assessment-on-msu-sr-qa-datasetQ-Align (VQA)
KLCC: 0.58634
PLCC: 0.71121
SROCC: 0.71812
Type: NR

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels | Papers | HyperAI