Learning recurrent memory activation networks for visual tracking

Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming Hsuan Yang

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


Facilitated by deep neural networks, numerous tracking methods have made significant advances. Existing deep trackers mainly utilize independent frames to model the target appearance, while paying less attention to its temporal coherence. In this paper, we propose a recurrent memory activation network (RMAN) to exploit the untapped temporal coherence of the target appearance for visual tracking. We build the RMAN on top of the long short-term memory network (LSTM) with an additional memory activation layer. Specifically, we first use the LSTM to model the temporal changes of the target appearance. Then we selectively activate the memory blocks via the activation layer to produce a temporally coherent representation. The recurrent memory activation layer enriches the target representations from independent frames and reduces the background interference through temporal consistency. The proposed RMAN is fully differentiable and can be optimized end-to-end. To facilitate network training, we propose a temporal coherence loss together with the original binary classification loss. Extensive experimental results on standard benchmarks demonstrate that our method performs favorably against the state-of-the-art approaches.

Original languageEnglish
Article number9269487
Pages (from-to)725-738
Number of pages14
JournalIEEE Transactions on Image Processing
Publication statusPublished - 2021

Bibliographical note

Publisher Copyright:
© 1992-2012 IEEE.

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Learning recurrent memory activation networks for visual tracking'. Together they form a unique fingerprint.

Cite this