In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets. Specifically, for applying single object tracking in MOT, we introduce a cost-sensitive tracking loss based on the state-of-the-art visual tracker, which encourages the model to focus on hard negative distractors during online learning. For data association, we propose Dual Matching Attention Networks (DMAN) with both spatial and temporal attention mechanisms. The spatial attention module generates dual attention maps which enable the network to focus on the matching patterns of the input image pair, while the temporal attention module adaptively allocates different levels of attention to different samples in the tracklet to suppress noisy observations. Experimental results on the MOT benchmark datasets show that the proposed algorithm performs favorably against both online and offline trackers in terms of identity-preserving metrics.
|Title of host publication||Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings|
|Editors||Vittorio Ferrari, Cristian Sminchisescu, Martial Hebert, Yair Weiss|
|Number of pages||18|
|Publication status||Published - 2018|
|Event||15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany|
Duration: 2018 Sept 8 → 2018 Sept 14
|Name||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Other||15th European Conference on Computer Vision, ECCV 2018|
|Period||18/9/8 → 18/9/14|
Bibliographical noteFunding Information:
Acknowledgments. This work is supported in part by National Natural Science Foundation of China (NSFC, Grant No. 61771303, 61671289, and 61521062), Science and Technology Commission of Shanghai Municipality (STCSM, Grant No. 17DZ1205602, 18DZ1200102, and 18DZ2270700), SJTU-YITU/Thinkforce Joint Lab of Visual Computing and Application, and Visbody. J. Zhu and N. Liu are supported by a scholarship from China Scholarship Council. M. Kim is supported by the Panasonic Silicon Valley Laboratory. M.-H. Yang acknowlegdes the support from NSF (Grant No. 1149783) and gifts from Adobe and NVIDIA.
© 2018, Springer Nature Switzerland AG.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)