Command Palette
Search for a command to run...
Liu Chang ; Li Rui ; Zhang Kaidong ; Luo Xin ; Liu Dong

Abstract
Diffusion models have demonstrated impressive abilities in generatingphoto-realistic and creative images. To offer more controllability for thegeneration process, existing studies, termed as early-constraint methods inthis paper, leverage extra conditions and incorporate them into pre-traineddiffusion models. Particularly, some of them adopt condition-specific modulesto handle conditions separately, where they struggle to generalize across otherconditions. Although follow-up studies present unified solutions to solve thegeneralization problem, they also require extra resources to implement, e.g.,additional inputs or parameter optimization, where more flexible and efficientsolutions are expected to perform steerable guided image synthesis. In thispaper, we present an alternative paradigm, namely Late-Constraint Diffusion(LaCon), to simultaneously integrate various conditions into pre-traineddiffusion models. Specifically, LaCon establishes an alignment between theexternal condition and the internal features of diffusion models, and utilizesthe alignment to incorporate the target condition, guiding the sampling processto produce tailored results. Experimental results on COCO dataset illustratethe effectiveness and superior generalization capability of LaCon under variousconditions and settings. Ablation studies investigate the functionalities ofdifferent components in LaCon, and illustrate its great potential to serve asan efficient solution to offer flexible controllability for diffusion models.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| conditional-text-to-image-synthesis-on-coco | SD using SDEdit | FID: 71.16 |
| conditional-text-to-image-synthesis-on-coco | SD using SDEdit (evaluated under color stroke) | CLIP Score: 0.2257 FID: 32.93 |
| conditional-text-to-image-synthesis-on-coco | SD using SDEdit (evaluated under image palette) | CLIP Score: 0.2138 |
| conditional-text-to-image-synthesis-on-coco | LCDG (Color, evaluated under image palette) | CLIP Score: 0.2580 FID: 20.61 |
| conditional-text-to-image-synthesis-on-coco | SD (text) | CLIP Score: 0.2673 FID: 27.99 |
| conditional-text-to-image-synthesis-on-coco | LCDG (Edge) | FID: 21.02 |
| conditional-text-to-image-synthesis-on-coco | LCDG | FID: 20.27 |
| conditional-text-to-image-synthesis-on-coco | T2I-Adapter (Sketch) | CLIP Score: 0.2580 FID: 21.72 |
| conditional-text-to-image-synthesis-on-coco | T2I-Adapter (Color, evaluated under image palette) | CLIP Score: 0.2613 FID: 26.54 |
| conditional-text-to-image-synthesis-on-coco | T2I-Adapter (Color, evaluated under color stroke) | FID: 30.84 |
| conditional-text-to-image-synthesis-on-coco | LCDG (Mask) | CLIP Score: 0.2617 FID: 20.94 |
| conditional-text-to-image-synthesis-on-coco | ControlNet (HED Edge) | CLIP Score: 0.2525 FID: 28.09 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.