HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data

Yucheng Shi Quanzheng Li Jin Sun Xiang Li Ninghao Liu

Enhancing Cognition and Explainability of Multimodal Foundation Models
  with Self-Synthesized Data

Abstract

Large multimodal models (LMMs) have shown impressive capabilities in a widerange of visual tasks. However, they often struggle with fine-grained visualreasoning, failing to identify domain-specific objectives and providejustifiable explanations for their predictions. To address this, we propose anovel visual rejection sampling framework to improve the cognition andexplainability of LMMs using self-synthesized data. Specifically, visualfine-tuning requires images, queries, and target answers. Our approach beginsby synthesizing interpretable answers that include human-verifiable visualfeatures. These features are based on expert-defined concepts, carefullyselected based on their alignment with the image content. After each round offine-tuning, we apply a reward model-free filtering mechanism to select thehighest-quality interpretable answers for the next round of tuning. Thisiterative process of data synthesis and fine-tuning progressively improves themodel's ability to generate accurate and reasonable explanations. Experimentalresults demonstrate the effectiveness of our method in improving both theaccuracy and explainability of specialized visual classification tasks.

Code Repositories

sycny/selfsynthx
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
fine-grained-visual-recognition-on-cub-200-1Selfsynthx
Accuracy (%): 85.02
fine-grained-visual-recognition-on-fgvc-2Selfsynthx
Accuracy (%): 91.99
fine-grained-visual-recognition-on-new-plantSelfsynthx
Accuracy (% ): 97.16
fine-grained-visual-recognition-on-stanford-2Selfsynthx
Accuracy (%): 86.91
pneumonia-detection-on-chest-x-ray-images-1Selfsynthx
Accuracy: 98.72

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data | Papers | HyperAI