HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Ovis-U1 Technical Report

Ovis-U1 Technical Report

Abstract

In this report, we introduce Ovis-U1, a 3-billion-parameter unified modelthat integrates multimodal understanding, text-to-image generation, and imageediting capabilities. Building on the foundation of the Ovis series, Ovis-U1incorporates a diffusion-based visual decoder paired with a bidirectional tokenrefiner, enabling image generation tasks comparable to leading models likeGPT-4o. Unlike some previous models that use a frozen MLLM for generationtasks, Ovis-U1 utilizes a new unified training approach starting from alanguage model. Compared to training solely on understanding or generationtasks, unified training yields better performance, demonstrating theenhancement achieved by integrating these two tasks. Ovis-U1 achieves a scoreof 69.6 on the OpenCompass Multi-modal Academic Benchmark, surpassing recentstate-of-the-art models such as Ristretto-3B and SAIL-VL-1.5-2B. Intext-to-image generation, it excels with scores of 83.72 and 0.89 on theDPG-Bench and GenEval benchmarks, respectively. For image editing, it achieves4.00 and 6.42 on the ImgEdit-Bench and GEdit-Bench-EN, respectively. As theinitial version of the Ovis unified model series, Ovis-U1 pushes the boundariesof multimodal understanding, generation, and editing.

Code Repositories

aidc-ai/ovis-u1
Official
pytorch
Mentioned in GitHub

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Ovis-U1 Technical Report | Papers | HyperAI