8 months ago

Abstract

Despite remarkable recent progress on both unconditional and conditionalimage synthesis, it remains a long-standing problem to learn generative modelsthat are capable of synthesizing realistic and sharp images from reconfigurablespatial layout (i.e., bounding boxes + class labels in an image lattice) andstyle (i.e., structural and appearance variations encoded by latent vectors),especially at high resolution. By reconfigurable, it means that a model canpreserve the intrinsic one-to-many mapping from a given layout to multipleplausible images with different styles, and is adaptive with respect toperturbations of a layout and style latent code. In this paper, we present alayout- and style-based architecture for generative adversarial networks(termed LostGANs) that can be trained end-to-end to generate images fromreconfigurable layout and style. Inspired by the vanilla StyleGAN, the proposedLostGAN consists of two new components: (i) learning fine-grained mask maps ina weakly-supervised manner to bridge the gap between layouts and images, and(ii) learning object instance-specific layout-aware feature normalization(ISLA-Norm) in the generator to realize multi-object style generation. Inexperiments, the proposed method is tested on the COCO-Stuff dataset and theVisual Genome dataset with state-of-the-art performance obtained. The code andpretrained models are available at \url{https://github.com/iVMCL/LostGANs}.

Source PDF