5 hours ago

Abstract

Unified multimodal models integrating visual understanding and generation face a fundamental challenge: visual generation incurs substantially higher computational costs than understanding, particularly for video. This imbalance motivates us to invert the conventional paradigm: rather than extending understanding-centric MLLMs to support generation, we propose Uni-ViGU, a framework that unifies video generation and understanding by extending a video generator as the foundation. We introduce a unified flow method that performs continuous flow matching for video and discrete flow matching for text within a single process, enabling coherent multimodal generation. We further propose a modality-driven MoE-based framework that augments Transformer blocks with lightweight layers for text generation while preserving generative priors. To repurpose generation knowledge for understanding, we design a bidirectional training mechanism with two stages: Knowledge Recall reconstructs input prompts to leverage learned text-video correspondences, while Capability Refinement fine-tunes on detailed captions to establish discriminative shared representations. Experiments demonstrate that Uni-ViGU achieves competitive performance on both video generation and understanding, validating generation-centric architectures as a scalable path toward unified multimodal intelligence. Project Page and Code: https://fr0zencrane.github.io/uni-vigu-page/.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

5 hours ago

Luozheng Qin Jia Gong Qian Qiao Tianjiao Li Li Xu Haoyu Pan Chao Qu Zhiyu Tan Hao Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

5 hours ago

Luozheng Qin Jia Gong Qian Qiao Tianjiao Li Li Xu Haoyu Pan Chao Qu Zhiyu Tan Hao Li

Abstract

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Luozheng Qin Jia Gong Qian Qiao Tianjiao Li Li Xu Haoyu Pan Chao Qu Zhiyu Tan Hao Li

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Luozheng Qin Jia Gong Qian Qiao Tianjiao Li Li Xu Haoyu Pan Chao Qu Zhiyu Tan Hao Li

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator

Luozheng Qin Jia Gong Qian Qiao Tianjiao Li Li Xu Haoyu Pan Chao Qu Zhiyu Tan Hao Li

Abstract

Build AI with AI

HyperAI Newsletters