Abstract
Perceiving meaningful activities in a long video sequence is a challenging problem due to ambiguous definition of 'meaningfulness' as well as clutters in the scene. We approach this problem by learning a generative model for regular motion patterns (termed as regularity) using multiple sources with very limited supervision. Specifically, we propose two methods that are built upon the autoencoders for their ability to work with little to no supervision. We first leverage the conventional handcrafted spatio-temporal local features and learn a fully connected autoencoder on them. Second, we build a fully convolutional feed-forward autoencoder to learn both the local features and the classifiers as an end-to-end learning framework. Our model can capture the regularities from multiple datasets. We evaluate our methods in both qualitative and quantitative ways - showing the learned regularity of videos in various aspects and demonstrating competitive performance on anomaly detection datasets as an application.
Original language | English |
---|---|
Title of host publication | Proceedings - 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 |
Publisher | IEEE Computer Society |
Pages | 733-742 |
Number of pages | 10 |
ISBN (Electronic) | 9781467388504 |
DOIs | |
Publication status | Published - 2016 Dec 9 |
Event | 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 - Las Vegas, United States Duration: 2016 Jun 26 → 2016 Jul 1 |
Publication series
Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
---|---|
Volume | 2016-December |
ISSN (Print) | 1063-6919 |
Conference
Conference | 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 |
---|---|
Country/Territory | United States |
City | Las Vegas |
Period | 16/6/26 → 16/7/1 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
All Science Journal Classification (ASJC) codes
- Software
- Computer Vision and Pattern Recognition