HyperAIHyperAI

Command Palette

Search for a command to run...

2 months ago

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

Zixin Yin Xili Dai Duomin Wang Xianfang Zeng Lionel M. Ni Gang Yu Heung-Yeung Shum

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion
  Transformers via Explicit Correspondence

Abstract

The reliance on implicit point matching via attention has become a corebottleneck in drag-based editing, resulting in a fundamental compromise onweakened inversion strength and costly test-time optimization (TTO). Thiscompromise severely limits the generative capabilities of diffusion models,suppressing high-fidelity inpainting and text-guided creation. In this paper,we introduce LazyDrag, the first drag-based image editing method forMulti-Modal Diffusion Transformers, which directly eliminates the reliance onimplicit point matching. In concrete terms, our method generates an explicitcorrespondence map from user drag inputs as a reliable reference to boost theattention control. This reliable reference opens the potential for a stablefull-strength inversion process, which is the first in the drag-based editingtask. It obviates the necessity for TTO and unlocks the generative capabilityof models. Therefore, LazyDrag naturally unifies precise geometric control withtext guidance, enabling complex edits that were previously out of reach:opening the mouth of a dog and inpainting its interior, generating new objectslike a ``tennis ball'', or for ambiguous drags, making context-aware changeslike moving a hand into a pocket. Additionally, LazyDrag supports multi-roundworkflows with simultaneous move and scale operations. Evaluated on theDragBench, our method outperforms baselines in drag accuracy and perceptualquality, as validated by VIEScore and human evaluation. LazyDrag not onlyestablishes new state-of-the-art performance, but also paves a new way toediting paradigms.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence | Papers | HyperAI