Selective residual learning for Visual Question Answering

Jongkwang Hong, Sungho Park, Hyeran Byun

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)


Visual Question Answering (VQA) aims to reason an answer, given a textual question and image pair. VQA methods are required to learn the relationship between image region features. These methods have the limitation of inefficient learning that can produce a performance drop. It is because current intra-relationship methods are trying to learn all the intra-relationships, regardless of their importance. In this paper, a novel self-attention based VQA module named Selective Residual learning (SelRes) is proposed. SelRes processes the residual learning selectively in self-attention networks. It measures the importance of the input vectors by the attention map and limits residual learning, except in the selected regions which related to the correct answer. Selective masking is also proposed, which can ensure that the selection in SelRes is preserved in the multi-stack structure of the VQA network. Our full model achieves new state-of-the-art performances on both from-scratch and fine-tuning models.

Original languageEnglish
Pages (from-to)366-374
Number of pages9
Publication statusPublished - 2020 Aug 18

Bibliographical note

Publisher Copyright:
© 2020 Elsevier B.V.

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence


Dive into the research topics of 'Selective residual learning for Visual Question Answering'. Together they form a unique fingerprint.

Cite this