In this paper, we address the issue of data imbalance in learning deep models for visual object tracking. Although it is well known that data distribution plays a crucial role in learning and inference models, considerably less attention has been paid to data imbalance in visual tracking. For the deep regression trackers that directly learn a dense mapping from input images of target objects to soft response maps, we identify their performance is limited by the extremely imbalanced pixel-to-pixel differences when computing regression loss. This prevents existing end-to-end learnable deep regression trackers from performing as well as discriminative correlation filters (DCFs) trackers. For the deep classification trackers that draw positive and negative samples to learn discriminative classifiers, there exists heavy class imbalance due to a limited number of positive samples when compared to the number of negative samples. To balance training data, we propose a novel shrinkage loss to penalize the importance of easy training data mostly coming from the background, which facilitates both deep regression and classification trackers to better distinguish target objects from the background. We extensively validate the proposed shrinkage loss function on six benchmark datasets, including the OTB-2013, OTB-2015, UAV-123, VOT-2016, VOT-2018 and LaSOT. Equipped with our shrinkage loss, the proposed one-stage deep regression tracker achieves favorable results against state-of-the-art methods, especially in comparison with DCFs trackers. Meanwhile, our shrinkage loss generalizes well to deep classification trackers. When replacing the original binary cross entropy loss with our shrinkage loss, three representative baseline trackers achieve large performance gains, even setting new state-of-the-art results.
|Number of pages||16|
|Journal||IEEE transactions on pattern analysis and machine intelligence|
|Publication status||Published - 2022 May 1|
Bibliographical notePublisher Copyright:
© 1979-2012 IEEE.
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
- Artificial Intelligence
- Applied Mathematics