HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

SAIL-VL2 Technical Report

SAIL-VL2 Technical Report

Abstract

We introduce SAIL-VL2, an open-suite vision-language foundation model (LVM)for comprehensive multimodal understanding and reasoning. As the successor toSAIL-VL, SAIL-VL2 achieves state-of-the-art performance at the 2B and 8Bparameter scales across diverse image and video benchmarks, demonstratingstrong capabilities from fine-grained perception to complex reasoning. Threecore innovations drive its effectiveness. First, a large-scale data curationpipeline with scoring and filtering strategies enhances both quality anddistribution across captioning, OCR, QA, and video data, improving trainingefficiency. Second, a progressive training framework begins with a powerfulpre-trained vision encoder (SAIL-ViT), advances through multimodalpre-training, and culminates in a thinking-fusion SFT-RL hybrid paradigm thatsystematically strengthens model capabilities. Third, architectural advancesextend beyond dense LLMs to efficient sparse Mixture-of-Experts (MoE) designs.With these contributions, SAIL-VL2 demonstrates competitive performance across106 datasets and achieves state-of-the-art results on challenging reasoningbenchmarks such as MMMU and MathVista. Furthermore, on the OpenCompassleaderboard, SAIL-VL2-2B ranks first among officially released open-sourcemodels under the 4B parameter scale, while serving as an efficient andextensible foundation for the open-source multimodal community.

Benchmarks

BenchmarkMethodologyMetrics
optical-character-recognition-on-ocrbench-v2-chineseSAIL-VL2-8B
Accuracy: 57.6
optical-character-recognition-on-ocrbench-v2-englishSAIL-VL2-8B
Accuracy: 49.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SAIL-VL2 Technical Report | Papers | HyperAI