HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Xuan Ju; Ailing Zeng; Yuxuan Bian; Shaoteng Liu; Qiang Xu

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

Abstract

Text-guided diffusion models have revolutionized image generation and editing, offering exceptional realism and diversity. Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model. This vector is subsequently fed into separate source and target diffusion branches for editing. The accuracy of this inversion process significantly impacts the final editing outcome, influencing both essential content preservation of the source image and edit fidelity according to the target prompt. Prior inversion techniques aimed at finding a unified solution in both the source and target diffusion branches. However, our theoretical and empirical analyses reveal that disentangling these branches leads to a distinct separation of responsibilities for preserving essential content and ensuring edit fidelity. Building on this insight, we introduce "Direct Inversion," a novel technique achieving optimal performance of both branches with just three lines of code. To assess image editing performance, we present PIE-Bench, an editing benchmark with 700 images showcasing diverse scenes and editing types, accompanied by versatile annotations and comprehensive evaluation metrics. Compared to state-of-the-art optimization-based inversion techniques, our solution not only yields superior performance across 8 editing methods but also achieves nearly an order of speed-up.

Code Repositories

cure-lab/pnpinversion
pytorch
Mentioned in GitHub
cure-lab/directinversion
Official
pytorch
Mentioned in GitHub
thu-cvml/texturediffusion
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
text-based-image-editing-on-pie-benchDirect Inversion+MasaCtrl
Background LPIPS: 87.94
Background PSNR: 22.64
CLIPSIM: 24.38
Structure Distance: 24.70
text-based-image-editing-on-pie-benchDirect Inversion+Pix2Pix-Zero
Background LPIPS: 138.98
Background PSNR: 21.53
CLIPSIM: 23.31
Structure Distance: 49.22
text-based-image-editing-on-pie-benchDirect Inversion+Prompt-to-Prompt
Background LPIPS: 54.55
Background PSNR: 27.22
CLIPSIM: 25.02
Structure Distance: 11.65
text-based-image-editing-on-pie-benchDirect Inversion+Plug-and-Play
Background LPIPS: 106.06
Background PSNR: 22.46
CLIPSIM: 25.41
Structure Distance: 24.29

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code | Papers | HyperAI