8 months ago

Abstract

There has been significant progress in personalized image synthesis withmethods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-worldapplicability is hindered by high storage demands, lengthy fine-tuningprocesses, and the need for multiple reference images. Conversely, existing IDembedding-based methods, while requiring only a single forward inference, facechallenges: they either necessitate extensive fine-tuning across numerous modelparameters, lack compatibility with community pre-trained models, or fail tomaintain high face fidelity. Addressing these limitations, we introduceInstantID, a powerful diffusion model-based solution. Our plug-and-play moduleadeptly handles image personalization in various styles using just a singlefacial image, while ensuring high fidelity. To achieve this, we design a novelIdentityNet by imposing strong semantic and weak spatial conditions,integrating facial and landmark images with textual prompts to steer the imagegeneration. InstantID demonstrates exceptional performance and efficiency,proving highly beneficial in real-world applications where identitypreservation is paramount. Moreover, our work seamlessly integrates withpopular pre-trained text-to-image diffusion models like SD1.5 and SDXL, servingas an adaptable plugin. Our codes and pre-trained checkpoints will be availableat https://github.com/InstantID/InstantID.

Source PDF View Code