Command Palette
Search for a command to run...

Abstract
Generating novel, yet realistic, images of persons is a challenging task dueto the complex interplay between the different image factors, such as theforeground, background and pose information. In this work, we aim at generatingsuch images based on a novel, two-stage reconstruction pipeline that learns adisentangled representation of the aforementioned image factors and generatesnovel person images at the same time. First, a multi-branched reconstructionnetwork is proposed to disentangle and encode the three factors into embeddingfeatures, which are then combined to re-compose the input image itself. Second,three corresponding mapping functions are learned in an adversarial manner inorder to map Gaussian noise to the learned embedding feature space, for eachfactor respectively. Using the proposed framework, we can manipulate theforeground, background and pose of the input image, and also sample newembedding features to generate such targeted manipulations, that provide morecontrol over the generation process. Experiments on Market-1501 and Deepfashiondatasets show that our model does not only generate realistic person imageswith new foregrounds, backgrounds and poses, but also manipulates the generatedfactors and interpolates the in-between states. Another set of experiments onMarket-1501 shows that our model can also be beneficial for the personre-identification task.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| gesture-to-gesture-translation-on-ntu-hand | DPIG | AMT: 7.1 IS: 2.4547 PSNR: 30.6487 |
| gesture-to-gesture-translation-on-senz3d | DPIG | AMT: 6.9 IS: 3.3874 PSNR: 26.9451 |
| pose-transfer-on-deep-fashion | Disentangled PG | IS: 3.228 SSIM: 0.614 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.