Abstract
In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets. Specifically, for applying single object tracking in MOT, we introduce a cost-sensitive tracking loss based on the state-of-the-art visual tracker, which encourages the model to focus on hard negative distractors during online learning. For data association, we propose Dual Matching Attention Networks (DMAN) with both spatial and temporal attention mechanisms. The spatial attention module generates dual attention maps which enable the network to focus on the matching patterns of the input image pair, while the temporal attention module adaptively allocates different levels of attention to different samples in the tracklet to suppress noisy observations. Experimental results on the MOT benchmark datasets show that the proposed algorithm performs favorably against both online and offline trackers in terms of identity-preserving metrics.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings |
Editors | Vittorio Ferrari, Cristian Sminchisescu, Martial Hebert, Yair Weiss |
Publisher | Springer Verlag |
Pages | 379-396 |
Number of pages | 18 |
ISBN (Print) | 9783030012274 |
DOIs | |
Publication status | Published - 2018 |
Event | 15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany Duration: 2018 Sept 8 → 2018 Sept 14 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11209 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Other
Other | 15th European Conference on Computer Vision, ECCV 2018 |
---|---|
Country/Territory | Germany |
City | Munich |
Period | 18/9/8 → 18/9/14 |
Bibliographical note
Funding Information:Acknowledgments. This work is supported in part by National Natural Science Foundation of China (NSFC, Grant No. 61771303, 61671289, and 61521062), Science and Technology Commission of Shanghai Municipality (STCSM, Grant No. 17DZ1205602, 18DZ1200102, and 18DZ2270700), SJTU-YITU/Thinkforce Joint Lab of Visual Computing and Application, and Visbody. J. Zhu and N. Liu are supported by a scholarship from China Scholarship Council. M. Kim is supported by the Panasonic Silicon Valley Laboratory. M.-H. Yang acknowlegdes the support from NSF (Grant No. 1149783) and gifts from Adobe and NVIDIA.
Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Computer Science(all)