3 months ago

Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data

Yucheng Shi Quanzheng Li Jin Sun Xiang Li Ninghao Liu

Abstract

Large multimodal models (LMMs) have shown impressive capabilities in a widerange of visual tasks. However, they often struggle with fine-grained visualreasoning, failing to identify domain-specific objectives and providejustifiable explanations for their predictions. To address this, we propose anovel visual rejection sampling framework to improve the cognition andexplainability of LMMs using self-synthesized data. Specifically, visualfine-tuning requires images, queries, and target answers. Our approach beginsby synthesizing interpretable answers that include human-verifiable visualfeatures. These features are based on expert-defined concepts, carefullyselected based on their alignment with the image content. After each round offine-tuning, we apply a reward model-free filtering mechanism to select thehighest-quality interpretable answers for the next round of tuning. Thisiterative process of data synthesis and fine-tuning progressively improves themodel's ability to generate accurate and reasonable explanations. Experimentalresults demonstrate the effectiveness of our method in improving both theaccuracy and explainability of specialized visual classification tasks.

Code Repositories

sycny/selfsynthx

Official

pytorch

Benchmarks

Benchmark	Methodology	Metrics
fine-grained-visual-recognition-on-cub-200-1	Selfsynthx	Accuracy (%): 85.02
fine-grained-visual-recognition-on-fgvc-2	Selfsynthx	Accuracy (%): 91.99
fine-grained-visual-recognition-on-new-plant	Selfsynthx	Accuracy (% ): 97.16
fine-grained-visual-recognition-on-stanford-2	Selfsynthx	Accuracy (%): 86.91
pneumonia-detection-on-chest-x-ray-images-1	Selfsynthx	Accuracy: 98.72

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette