Contrastive Attention Maps for Self-supervised Co-localization

Minsong Ki, Youngjung Uh, Junsuk Choe, Hyeran Byun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)


The goal of unsupervised co-localization is to locate the object in a scene under the assumptions that 1) the dataset consists of only one superclass, e.g., birds, and 2) there are no human-annotated labels in the dataset. The most recent method achieves impressive co-localization performance by employing self-supervised representation learning approaches such as predicting rotation. In this paper, we introduce a new contrastive objective directly on the attention maps to enhance co-localization performance. Our contrastive loss function exploits rich information of location, which induces the model to activate the extent of the object effectively. In addition, we propose a pixel-wise attention pooling that selectively aggregates the feature map regarding their magnitudes across channels. Our methods are simple and shown effective by extensive qualitative and quantitative evaluation, achieving state-of-the-art co-localization performances by large margins on four datasets: CUB-200-2011, Stanford Cars, FGVC-Aircraft, and Stanford Dogs. Our code will be publicly available online for the research community.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages10
ISBN (Electronic)9781665428125
Publication statusPublished - 2021
Event18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada
Duration: 2021 Oct 112021 Oct 17

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499


Conference18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
CityVirtual, Online

Bibliographical note

Publisher Copyright:
© 2021 IEEE

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Contrastive Attention Maps for Self-supervised Co-localization'. Together they form a unique fingerprint.

Cite this