a month ago

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Jinshu Chen Xinghui Li Xu Bai Tianxiang Ma Pengze Zhang Zhuowei Chen Gen Li Lijie Liu Songtao Zhao Bingchuan Li

Abstract

Recent advances in video insertion based on diffusion models are impressive.However, existing methods rely on complex control signals but struggle withsubject consistency, limiting their practical applicability. In this paper, wefocus on the task of Mask-free Video Insertion and aim to resolve three keychallenges: data scarcity, subject-scene equilibrium, and insertionharmonization. To address the data scarcity, we propose a new data pipelineInsertPipe, constructing diverse cross-pair data automatically. Building uponour data pipeline, we develop OmniInsert, a novel unified framework formask-free video insertion from both single and multiple subject references.Specifically, to maintain subject-scene equilibrium, we introduce a simple yeteffective Condition-Specific Feature Injection mechanism to distinctly injectmulti-source conditions and propose a novel Progressive Training strategy thatenables the model to balance feature injection from subjects and source video.Meanwhile, we design the Subject-Focused Loss to improve the detailedappearance of the subjects. To further enhance insertion harmonization, wepropose an Insertive Preference Optimization methodology to optimize the modelby simulating human preferences, and incorporate a Context-Aware Rephrasermodule during reference to seamlessly integrate the subject into the originalscenes. To address the lack of a benchmark for the field, we introduceInsertBench, a comprehensive benchmark comprising diverse scenes withmeticulously selected subjects. Evaluation on InsertBench indicates OmniInsertoutperforms state-of-the-art closed-source commercial solutions. The code willbe released.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Jinshu Chen Xinghui Li Xu Bai Tianxiang Ma Pengze Zhang Zhuowei Chen Gen Li Lijie Liu Songtao Zhao Bingchuan Li1 more

Abstract

Build AI with AI

Hyper Newsletters

Jinshu Chen Xinghui Li Xu Bai Tianxiang Ma Pengze Zhang Zhuowei Chen Gen Li Lijie Liu Songtao Zhao Bingchuan Li