HyperAIHyperAI

Command Palette

Search for a command to run...

a month ago

Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Yanzuo Lu Xin Xia Manlin Zhang Huafeng Kuang Jianbin Zheng Yuxi Ren Xuefeng Xiao

Hyper-Bagel: A Unified Acceleration Framework for Multimodal
  Understanding and Generation

Abstract

Unified multimodal models have recently attracted considerable attention fortheir remarkable abilities in jointly understanding and generating diversecontent. However, as contexts integrate increasingly numerous interleavedmultimodal tokens, the iterative processes of diffusion denoising andautoregressive decoding impose significant computational overhead. To addressthis, we propose Hyper-Bagel, a unified acceleration framework designed tosimultaneously speed up both multimodal understanding and generation tasks. Ourapproach uses a divide-and-conquer strategy, employing speculative decoding fornext-token prediction and a multi-stage distillation process for diffusiondenoising. The framework delivers substantial performance gains, achieving overa 2x speedup in multimodal understanding. For generative tasks, our resultinglossless 6-NFE model yields a 16.67x speedup in text-to-image generation and a22x speedup in image editing, all while preserving the high-quality output ofthe original model. We further develop a highly efficient 1-NFE model thatenables near real-time interactive editing and generation. By combiningadvanced adversarial distillation with human feedback learning, this modelachieves ultimate cost-effectiveness and responsiveness, making complexmultimodal interactions seamless and instantaneous.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation | Papers | HyperAI