Abstract
Crowd counting is usually handled in a density map regression fashion, which is supervised via an L2 loss between the predicted density map and ground truth. To effectively regulate models, various improved L2 loss functions have been developed to find a better correspondence between predicted density and annotation positions. In this paper, we propose to predict the density map at one resolution but measure its quality via a derived log-formed loss at multiple resolutions. Unlike existing methods that assume density maps at different resolutions are independent, our loss is obtained by modeling the likelihood function inspired by the relationship of density maps across multi-resolutions. We find that the traditional single-resolution L2 loss is a particular case of our derived log-likelihood. We mathematically prove it is superior to a singleresolution L2 loss. Without bells and whistles, the proposed loss substantially improves several baselines and performs favorably compared to state-of-the-art methods on five crowd counting datasets: NWPU-Crowd, ShanghaiTech A & B, UCF-QNRF, and JHU-Crowd++. The source code and trained models are released at https://github.com/streamer-AP/PML_Loss.git.
Original language | English |
---|---|
Pages (from-to) | 3232-3244 |
Number of pages | 13 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 34 |
Issue number | 5 |
DOIs | |
Publication status | Published - 2024 May 1 |
Bibliographical note
Publisher Copyright:© 2024 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
All Science Journal Classification (ASJC) codes
- Media Technology
- Electrical and Electronic Engineering