HyperAIHyperAI

Command Palette

Search for a command to run...

Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data

Abstract

Vision-Language Models (VLMs) have recently made significant progress, butthe limited scale and quality of open-source instruction data hinder theirperformance compared to closed-source models. In this work, we address thislimitation by introducing Infinity-MM, a large-scale multimodal instructiondataset with 40 million samples, enhanced through rigorous quality filteringand deduplication. We also propose a synthetic instruction generation methodbased on open-source VLMs, using detailed image annotations and diversequestion generation. Using this data, we trained a 2-billion-parameter VLM,Aquila-VL-2B, achieving state-of-the-art (SOTA) performance for models ofsimilar scale. This demonstrates that expanding instruction data and generatingsynthetic data can significantly improve the performance of open-source models.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp