Abstract
Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each video contains action frames of interest. Previous methods aggregate frame-level class scores to produce video-level prediction and learn from video-level action labels. This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately. In this paper, we design Background Suppression Network (BaSNet) which introduces an auxiliary class for background and has a two-branch weight-sharing architecture with an asymmetrical training strategy. This enables BaS-Net to suppress activations from background frames to improve localization performance. Extensive experiments demonstrate the effectiveness of BaS-Net and its superiority over the state-of-theart methods on the most popular benchmarks - THUMOS'14 and ActivityNet. Our code and the trained model are available at https://github.com/Pilhyeon/BaSNet-pytorch.
Original language | English |
---|---|
Title of host publication | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence |
Publisher | AAAI press |
Pages | 11320-11327 |
Number of pages | 8 |
ISBN (Electronic) | 9781577358350 |
Publication status | Published - 2020 |
Event | 34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States Duration: 2020 Feb 7 → 2020 Feb 12 |
Publication series
Name | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence |
---|
Conference
Conference | 34th AAAI Conference on Artificial Intelligence, AAAI 2020 |
---|---|
Country/Territory | United States |
City | New York |
Period | 20/2/7 → 20/2/12 |
Bibliographical note
Funding Information:This project was partly supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (NRF-2017M3C4A7069370) and the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (No. 2019-0-01558: Study on audio, video, 3d map and activation map generation system using deep generative model)
Publisher Copyright:
© 2020, Association for the Advancement of Artificial Intelligence.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence