Abstract
The goal of unsupervised co-localization is to locate the object in a scene under the assumptions that 1) the dataset consists of only one superclass, e.g., birds, and 2) there are no human-annotated labels in the dataset. The most recent method achieves impressive co-localization performance by employing self-supervised representation learning approaches such as predicting rotation. In this paper, we introduce a new contrastive objective directly on the attention maps to enhance co-localization performance. Our contrastive loss function exploits rich information of location, which induces the model to activate the extent of the object effectively. In addition, we propose a pixel-wise attention pooling that selectively aggregates the feature map regarding their magnitudes across channels. Our methods are simple and shown effective by extensive qualitative and quantitative evaluation, achieving state-of-the-art co-localization performances by large margins on four datasets: CUB-200-2011, Stanford Cars, FGVC-Aircraft, and Stanford Dogs. Our code will be publicly available online for the research community.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 2783-2792 |
Number of pages | 10 |
ISBN (Electronic) | 9781665428125 |
DOIs | |
Publication status | Published - 2021 |
Event | 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada Duration: 2021 Oct 11 → 2021 Oct 17 |
Publication series
Name | Proceedings of the IEEE International Conference on Computer Vision |
---|---|
ISSN (Print) | 1550-5499 |
Conference
Conference | 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 |
---|---|
Country/Territory | Canada |
City | Virtual, Online |
Period | 21/10/11 → 21/10/17 |
Bibliographical note
Publisher Copyright:© 2021 IEEE
All Science Journal Classification (ASJC) codes
- Software
- Computer Vision and Pattern Recognition