DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Hsin Ying Lee, Hung Yu Tseng, Qi Mao, Jia Bin Huang, Yu Ding Lu, Maneesh Singh, Ming Hsuan Yang

Research output: Contribution to journalArticlepeer-review

186 Citations (Scopus)


Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for this task: (1) lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images. To synthesize diverse outputs, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Our model takes the encoded content features extracted from a given input and attribute vectors sampled from the attribute space to synthesize diverse outputs at test time. To handle unpaired training data, we introduce a cross-cycle consistency loss based on disentangled representations. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks without paired training data. For quantitative evaluations, we measure realism with user study and Fréchet inception distance, and measure diversity with the perceptual distance metric, Jensen–Shannon divergence, and number of statistically-different bins.

Original languageEnglish
Pages (from-to)2402-2417
Number of pages16
JournalInternational Journal of Computer Vision
Issue number10-11
Publication statusPublished - 2020 Nov 1

Bibliographical note

Funding Information:
This work is supported in part by the NSF CAREER Grant #1149783, the NSF Grant #1755785, and gifts from Verisk, Adobe and Google.

Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence


Dive into the research topics of 'DRIT++: Diverse Image-to-Image Translation via Disentangled Representations'. Together they form a unique fingerprint.

Cite this