Both weakly supervised single object localization and semantic segmentation techniques learn an object's location using only image-level labels. However, these techniques are limited to cover only the most discriminative part of the object and not the entire object. To address this problem, we propose an attention-based dropout layer, which utilizes the attention mechanism to locate the entire object efficiently. To achieve this, we devise two key components, 1) hiding the most discriminative part from the model to capture the entire object, and 2) highlighting the informative region to improve the classification power of the model. These allow the classifier to be maintained with a reasonable accuracy while the entire object is covered. Through extensive experiments, we demonstrate that the proposed method effectively improves the weakly supervised single object localization accuracy, thereby achieving a new state-of-the-art localization accuracy on the CUB-200-2011 and a comparable accuracy existing state-of-the-arts on the ImageNet-1k. The proposed method is also effective in improving the weakly supervised semantic segmentation performance on the Pascal VOC and MS COCO. Furthermore, the proposed method is more efficient than existing techniques in terms of parameter and computation overheads. Additionally, the proposed method can be easily applied in various backbone networks.
|Number of pages||16|
|Journal||IEEE transactions on pattern analysis and machine intelligence|
|Publication status||Published - 2021 Dec 1|
Bibliographical notePublisher Copyright:
© 1979-2012 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
- Artificial Intelligence
- Applied Mathematics