Command Palette
Search for a command to run...
Zhang Lvmin ; Rao Anyi ; Agrawala Maneesh

Abstract
We present ControlNet, a neural network architecture to add spatialconditioning controls to large, pretrained text-to-image diffusion models.ControlNet locks the production-ready large diffusion models, and reuses theirdeep and robust encoding layers pretrained with billions of images as a strongbackbone to learn a diverse set of conditional controls. The neuralarchitecture is connected with "zero convolutions" (zero-initializedconvolution layers) that progressively grow the parameters from zero and ensurethat no harmful noise could affect the finetuning. We test various conditioningcontrols, eg, edges, depth, segmentation, human pose, etc, with StableDiffusion, using single or multiple conditions, with or without prompts. Weshow that the training of ControlNets is robust with small (<50k) and large(>1m) datasets. Extensive results show that ControlNet may facilitate widerapplications to control image diffusion models.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| layout-to-image-generation-on-layoutbench-1 | ControlNet | AP: 9.2 |
| layout-to-image-generation-on-layoutbench-2 | ControlNet | AP: 15.3 |
| layout-to-image-generation-on-layoutbench-3 | ControlNet | AP: 10.8 |
| layout-to-image-generation-on-layoutbench-4 | ControlNet | AP: 6.4 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.