Text-adaptive generative adversarial networks: Manipulating images with natural language

Seonghyeon Nam, Yunji Kim, Seon Joo Kim

Research output: Contribution to journalConference articlepeer-review

98 Citations (Scopus)

Abstract

This paper addresses the problem of manipulating images using natural language description. Our task aims to semantically modify visual attributes of an object in an image according to the text describing the new visual appearance. Although existing methods synthesize images having new attributes, they do not fully preserve text-irrelevant contents of the original image. In this paper, we propose the text-adaptive generative adversarial network (TAGAN) to generate semantically manipulated images while preserving text-irrelevant contents. The key to our method is the text-adaptive discriminator that creates word-level local discriminators according to input text to classify fine-grained attributes independently. With this discriminator, the generator learns to generate images where only regions that correspond to the given text are modified. Experimental results show that our method outperforms existing methods on CUB and Oxford-102 datasets, and our results were mostly preferred on a user study. Extensive analysis shows that our method is able to effectively disentangle visual attributes and produce pleasing outputs.

Original languageEnglish
Pages (from-to)42-51
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2018-December
Publication statusPublished - 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: 2018 Dec 22018 Dec 8

Bibliographical note

Funding Information:
Acknowledgement This work was supported by Global Ph.D. Fellowship Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF2015H1A2A1033924), Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (2018-0-01858, Video Manipulation and Language-based Image Editing Technique for Detecting Manipulated Image/Video), and the ICT R&D program of MSIT/IITP (2017-0-01772, Development of QA systems for Video Story Understanding to pass the Video Turing Test).

Publisher Copyright:
© 2018 Curran Associates Inc..All rights reserved.

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Text-adaptive generative adversarial networks: Manipulating images with natural language'. Together they form a unique fingerprint.

Cite this