Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets

Junsuk Choe, Seong Joon Oh, Sanghyuk Chun, Seungho Lee, Zeynep Akata, Hyunjung Shim

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)

Abstract

Weakly-supervised object localization (WSOL) has gained popularity over the last years for its promise to train localization models with only image-level labels. Since the seminal WSOL work of class activation mapping (CAM), the field has focused on how to expand the attention regions to cover objects more broadly and localize them better. However, these strategies rely on full localization supervision for validating hyperparameters and model selection, which is in principle prohibited under the WSOL setup. In this paper, we argue that WSOL task is ill-posed with only image-level labels, and propose a new evaluation protocol where full supervision is limited to only a small held-out set not overlapping with the test set. We observe that, under our protocol, the five most recent WSOL methods have not made a major improvement over the CAM baseline. Moreover, we report that existing WSOL methods have not reached the few-shot learning baseline, where the full-supervision at validation time is used for model training instead. Based on our findings, we discuss some future directions for WSOL. Source code and dataset are available at https://github.com/clovaai/wsolevaluation https://github.com/clovaai/wsolevaluation.

Original languageEnglish
Pages (from-to)1732-1748
Number of pages17
JournalIEEE transactions on pattern analysis and machine intelligence
Volume45
Issue number2
DOIs
Publication statusPublished - 2023 Feb 1

Bibliographical note

Publisher Copyright:
© 1979-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets'. Together they form a unique fingerprint.

Cite this