In-sample Contrastive Learning and Consistent Attention for Weakly Supervised Object Localization

Minsong Ki, Youngjung Uh, Wonyoung Lee, Hyeran Byun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Weakly supervised object localization (WSOL) aims to localize the target object using only the image-level supervision. Recent methods encourage the model to activate feature maps over the entire object by dropping the most discriminative parts. However, they are likely to induce excessive extension to the backgrounds which leads to over-estimated localization. In this paper, we consider the background as an important cue that guides the feature activation to cover the sophisticated object region and propose contrastive attention loss. The loss promotes similarity between foreground and its dropped version, and, dissimilarity between the dropped version and background. Furthermore, we propose foreground consistency loss that penalizes earlier layers producing noisy attention regarding the later layer as a reference to provide them with a sense of backgroundness. It guides the early layers to activate on objects rather than locally distinctive backgrounds so that their attentions to be similar to the later layer. For better optimizing the above losses, we use the non-local attention blocks to replace channel-pooled attention leading to enhanced attention maps considering the spatial similarity. Last but not least, we propose to drop background regions in addition to the most discriminative region. Our method achieves state-of-the-art performance on CUB-200-2011 and ImageNet benchmark datasets regarding top-1 localization accuracy and MaxBoxAccV2, and we provide detailed analysis on our individual components. The code will be publicly available online for reproducibility.

Original languageEnglish
Title of host publicationComputer Vision – ACCV 2020 - 15th Asian Conference on Computer Vision, 2020, Revised Selected Papers
EditorsHiroshi Ishikawa, Cheng-Lin Liu, Tomas Pajdla, Jianbo Shi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages3-18
Number of pages16
ISBN (Print)9783030695378
DOIs
Publication statusPublished - 2021
Event15th Asian Conference on Computer Vision, ACCV 2020 - Virtual, Online
Duration: 2020 Nov 302020 Dec 4

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12625 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Asian Conference on Computer Vision, ACCV 2020
CityVirtual, Online
Period20/11/3020/12/4

Bibliographical note

Funding Information:
Acknowledgements. This work was supported by the National Research Foundation of Korea grant funded by Korean government (No. NRF-2019R1A2C2003760) and Artificial Intelligence Graduate School Program (YONSEI UNIVERSITY) under Grant 2020-0-01361. We thank Junsuk choe for his valuable discussion.

Publisher Copyright:
© 2021, Springer Nature Switzerland AG.

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'In-sample Contrastive Learning and Consistent Attention for Weakly Supervised Object Localization'. Together they form a unique fingerprint.

Cite this