Command Palette
Search for a command to run...
Wang Qixun ; Bai Xu ; Wang Haofan ; Qin Zekui ; Chen Anthony ; Li Huaxia ; Tang Xu ; Hu Yao

Abstract
There has been significant progress in personalized image synthesis withmethods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-worldapplicability is hindered by high storage demands, lengthy fine-tuningprocesses, and the need for multiple reference images. Conversely, existing IDembedding-based methods, while requiring only a single forward inference, facechallenges: they either necessitate extensive fine-tuning across numerous modelparameters, lack compatibility with community pre-trained models, or fail tomaintain high face fidelity. Addressing these limitations, we introduceInstantID, a powerful diffusion model-based solution. Our plug-and-play moduleadeptly handles image personalization in various styles using just a singlefacial image, while ensuring high fidelity. To achieve this, we design a novelIdentityNet by imposing strong semantic and weak spatial conditions,integrating facial and landmark images with textual prompts to steer the imagegeneration. InstantID demonstrates exceptional performance and efficiency,proving highly beneficial in real-world applications where identitypreservation is paramount. Moreover, our work seamlessly integrates withpopular pre-trained text-to-image diffusion models like SD1.5 and SDXL, servingas an adaptable plugin. Our codes and pre-trained checkpoints will be availableat https://github.com/InstantID/InstantID.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| diffusion-personalization-tuning-free-on | InstantID | Cosine Similarity: 0.713 FID: 18.598 LPIPS: 0.437 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.