HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Abstract

The recent surge of Multimodal Large Language Models (MLLMs) hasfundamentally reshaped the landscape of AI research and industry, sheddinglight on a promising path toward the next AI milestone. However, significantchallenges remain preventing MLLMs from being practical in real-worldapplications. The most notable challenge comes from the huge cost of running anMLLM with a massive number of parameters and extensive computation. As aresult, most MLLMs need to be deployed on high-performing cloud servers, whichgreatly limits their application scopes such as mobile, offline,energy-sensitive, and privacy-protective scenarios. In this work, we presentMiniCPM-V, a series of efficient MLLMs deployable on end-side devices. Byintegrating the latest MLLM techniques in architecture, pretraining andalignment, the latest MiniCPM-Llama3-V 2.5 has several notable features: (1)Strong performance, outperforming GPT-4V-1106, Gemini Pro and Claude 3 onOpenCompass, a comprehensive evaluation over 11 popular benchmarks, (2) strongOCR capability and 1.8M pixel high-resolution image perception at any aspectratio, (3) trustworthy behavior with low hallucination rates, (4) multilingualsupport for 30+ languages, and (5) efficient deployment on mobile phones. Moreimportantly, MiniCPM-V can be viewed as a representative example of a promisingtrend: The model sizes for achieving usable (e.g., GPT-4V) level performanceare rapidly decreasing, along with the fast growth of end-side computationcapacity. This jointly shows that GPT-4V level MLLMs deployed on end devicesare becoming increasingly possible, unlocking a wider spectrum of real-world AIapplications in the near future.

Code Repositories

OpenBMB/MiniCPM-o
Official
pytorch
Mentioned in GitHub
openbmb/minicpm-v
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
temporal-relation-extraction-on-vinogroundMiniCPM-2.6
Group Score: 11.2
Text Score: 32.6
Video Score: 29.2
zero-shot-video-question-answer-on-video-mme-1MiniCPM-V 2.6 (8B)
Accuracy (%): 63.7

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
MiniCPM-V: A GPT-4V Level MLLM on Your Phone | Papers | HyperAI