A Fisher Kernel Approach for Multiple Instance Based Object Retrieval in Video Surveillance

doi:10.4316/AECE.2015.04006

4/2015 - 6

View TOC | « Previous Article | Next Article »

A Fisher Kernel Approach for Multiple Instance Based Object Retrieval in Video Surveillance

MIRONICA, I. , MITREA, C. A. , IONESCU, B. , LAMBERT, P.

View the paper record and citations in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (1,590 KB) | Citation | Downloads: 914 | Views: 3,795

Author keywords
automated video surveillance, Fisher kernel representation, multiple-instance object retrieval

References keywords
recognition(11), video(10), vision(9), surveillance(9), image(9), processing(7), pattern(7), machine(7), classification(7), fisher(6)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2015-11-30
Volume 15, Issue 4, Year 2015, On page(s): 43 - 52
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2015.04006
Web of Science Accession Number: 000368499800006
SCOPUS ID: 84949964857

Abstract

Full text preview

This paper presents an automated surveillance system that exploits the Fisher Kernel representation in the context of multiple-instance object retrieval task. The proposed algorithm has the main purpose of tracking a list of persons in several video sources, using only few training examples. In the first step, the Fisher Kernel representation describes a set of features as the derivative with respect to the log-likelihood of the generative probability distribution that models the feature distribution. Then, we learn the generative probability distribution over all features extracted from a reduced set of relevant frames. The proposed approach shows significant improvements and we demonstrate that Fisher kernels are well suited for this task. We demonstrate the generality of our approach in terms of features by conducting an extensive evaluation with a broad range of keypoints features. Also, we evaluate our method on two standard video surveillance datasets attaining superior results comparing to state-of-the-art object recognition algorithms.

References

Cited By «-- Click to see who has cited this paper

[1] Ionut Mironica, Bogdan Ionescu, Jasper Uijlings, Nicu Sebe, "Fisher Kernel based Relevance Feedback for Multimodal Video Retrieval", ACM International Conference on Multimedia Retrieval - ICMR 2013, Dallas, Texas, USA, April 16 - 19, 2013.
[CrossRef] [SCOPUS Times Cited 10]

[2] C. Mitrea, I. Mironica, B. Ionescu, R. Dogaru, "Video Surveillance Classification-based Multiple Instance Object Retrieval: Evaluation and Dataset," International Conference on Intelligent Computer Communication and Processing (ICCP), ISBN 978-1-4799-6568-7, pp. 171-179, Cluj, Romania, 4-6, September, 2014.
[CrossRef] [SCOPUS Times Cited 10]

[3] P. Korshunov, T. Ebrahimi, "PEViD: Privacy Evaluation Video Dataset Applications of Digital Image Processing," Proceedings of SPIE International Society for Optics and Photonics, vol. 8856, pp. 512-522, 2013.
[CrossRef] [Web of Science Times Cited 6] [SCOPUS Times Cited 31]

[4] J. Aggarwal, J. Ryoo, "Human activity analysis: A review," In ACM Computing Surveys (CSUR), vol. 43(3), pp. 162-205, 2011.
[CrossRef] [Web of Science Times Cited 1452] [SCOPUS Times Cited 1801]

[5] D. Duque, H. Santos, P. Cortez, "The OBSERVER: An Intelligent and Automated Video Surveillance System," In Proceedings of the International Conference on Image Analysis and Recognition (ICIAR), ISBN. 978-3-540-44893-8, pp. 989-909, 2006.
[CrossRef] [SCOPUS Times Cited 18]

[6] Y. Mingqiang, K. Kidiyo, R. Joseph, "A Survey of Shape Feature Extraction Techniques," In International Conference of Pattern Recognition (ICPR), ISBN 978-953-7619-24-4, pp. 43-90, Tampa, Florida, USA, 8-11 December, 2008.
[CrossRef]

[7] W. Choi, J. Rho, D. Han, H. Ko, "Selective background adaptation based abnormal acoustic event recognition for audio surveillance," In International IEEE Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp. 118-123, Beijing, China, 18-21 Sept. 2012.
[CrossRef] [Web of Science Times Cited 14] [SCOPUS Times Cited 19]

[8] D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," In International Journal of Computer Vision (IJCV), ISSN 0920-5691, vol. 60(2), pp. 91-110, 2004.
[CrossRef] [Web of Science Times Cited 35213] [SCOPUS Times Cited 48736]

[9] J. L. Landabaso, X. L-Q. Xu, M. Pardas "Robust tracking and object classification towards automated video surveillance", In International Conference of Image Analysis and Recognition (IAR), ISSN 0302-9743, vol. 32(12), pp. 463-470, Porto, 29-30 September, 2004.
[CrossRef] [SCOPUS Times Cited 11]

[10] B. Benfold, I. Reid, "Stable multi-target tracking in real-time surveillance video," In Computer Vision and Pattern Recognition (CVPR), pp. 3457-3464, Colorado Springs, USA, 21-23 June, 2011.
[CrossRef] [SCOPUS Times Cited 563]

[11] S. Muller-Schneiders, T. Jager, H. Loos, W. Niem, "Performance evaluation of a real time video surveillance system", In IEEE Intenational Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance(VS-PETS), ISBN 0-7803-9424-0, pp. 137-144, 2005.
[CrossRef] [SCOPUS Times Cited 47]

[12] J. Stottinger, B. T. Goras, N. Sebe, A. Hanbury, "Behavior and properties of spatio-temporal local features under visual transformations," In Proceedings of the International ACM Conference on Multimedia (ACM MM), ISBN: 978-1-60558-933-6, pp. 1155-1158, Florence, Italy, 25-29 October 2010.
[CrossRef] [SCOPUS Times Cited 3]

[13] N. Ikizler-Cinbis, S. Sclaroff, "Object, scene and actions: combining multiple features for human action recognition," In Proceedings of the European Conference on Computer Vision (ECCV), vol. 6311, pp. 494-507, Heraklion, Crete, Greece, 5-10 September, 2011.
[CrossRef] [SCOPUS Times Cited 187]

[14] Y. Yang, D. Ramanan, "Articulated human detection with flexible mixtures of parts," In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(12):2878-2890, 2013.
[CrossRef] [Web of Science Times Cited 492] [SCOPUS Times Cited 640]

[15] N. Rostamzadeh, G. Zen, I. Mironica, J.R.R. Uijlings, N. Sebe, "Daily Living Activities Recognition via Efficient High and Low Level Cues Combination and Fisher Kernel Representation," In IEEE International Conference on Image Analysis and Processing (ICIAP), ISSN 0302-9743, pp. 431-441, 2013.
[CrossRef] [SCOPUS Times Cited 10]

[16] T. Jaakkola, D. Haussler, "Exploiting Generative Models in Discriminative Classifiers," In International Conference on Advances in Neural Information Processing Systems II, ISBN:0-262-11245-0, pp. 487-493, 1998.

[17] F. Perronnin, J. Sanchez, T. Mensink, "Improving the Fisher Kernel for Large-Scale Image Classification," In European Conference on Computer Vision (ECCV), LNCS 6314, pp. 143-156, 5-11 September, Heraklion, Crete, Greece, 2010.
[CrossRef] [Web of Science Times Cited 1585] [SCOPUS Times Cited 1722]

[18] O. Aran, L. Akarun, "A Multi-Class Classification Strategy for Fisher Scores: Application to Signer Independent Sign Language Recognition," In Pattern Recognition, 43(5):1776-1788, 2010.
[CrossRef] [Web of Science Times Cited 35] [SCOPUS Times Cited 47]

[19] P.J. Moreno, R. Rifkin, "Using the Fisher Kernel Method for Web Audio Classification," In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ISSN 1520-6149, vol. 6, pp. 2417-2420, 5-9 June, 2000, Istanbul, Turkey.
[CrossRef] [SCOPUS Times Cited 52]

[20] Q. Sun, R. Li, D. Luo, W. Xihong, "Text Segmentation with LDA-based Fisher Kernel," In Annual Meeting of the Association for Computational Linguistics on Human Language Technologies, 2008.
[CrossRef] [SCOPUS Times Cited 52]

[21] G. K. Myers, C. G. Snoek, R. Nallapati, J. van Hout, S. Pancoast, R. Nevatia, C. Sun, "Evaluating Multimedia Features and Fusion for Example-based Event Detection, " In International Journal of Machine Vision and Applications (MVAP), 25(1):17-32, 2014.
[CrossRef] [Web of Science Times Cited 18] [SCOPUS Times Cited 25]

[22] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, "Speeded-up robust features (SURF)," In Computer Vision and Image Understanding (CVIU), vol. 110(3), pp. 346-359, 2008.
[CrossRef] [Web of Science Times Cited 7639] [SCOPUS Times Cited 12394]

[23] A. Bosch, A. Zisserman, X. Munoz, "Image classifcation using random forests and ferns," In IEEE International Conference on Computer Vision (ICCV), pp. 1-8, Rio de Janeiro, Brasil, 14-21 Oct. 2007. .
[CrossRef] [SCOPUS Times Cited 1127]

[24] C. M. Bishop, "Pattern recognition and machine learning," In New York: Springer, ISBN 978-0-387-31073-2, vol. 4, nr. 4, 2006.

[25] C. R. Wren, A. Azarbayejani, T. Darrell, A. P. Pentland, "Pfinder: real-time tracking of the human body," In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), ISSN 0162-8828, vol. 19(7), pp.780-785, 1997.
[CrossRef] [Web of Science Times Cited 48]

[26] K. Chatfield, V. Lempitsky, A. Vedaldi, A. Zisserman, "The devil is in the details: an evaluation of recent feature encoding methods," In International Proceedings of British Machine Vision Conference (BMVC), pp. 1-12, Dundee, 29 August - 2 September 2011.
[CrossRef] [Web of Science Times Cited 1722]

[27] A. Vedaldi, B. Fulkerson, "VLFeat: An Open and Portable Library of Computer Vision Algorithms," In Proceedings of the International Conference on Multimedia (ACM MM), ISBN: 978-1-60558-933-6, pp. 1469-1472, 2008, [Online] Available: Temporary on-line reference link removed - see the PDF document

[28] V. N. Vapnik, "Statistical Learning Theory," New York: John Wiley & Sons, ISBN: 978-0-471-03003-4, 1998.

[29] O. Chapelle, "Training a Support Vector Machine in the Primal," In Neural Computation, MIT Press, vol. 19(5), pp. 1155-1178, 2007.
[CrossRef] [Web of Science Times Cited 460] [SCOPUS Times Cited 589]

[30] C. G. M. Snoek, K. E. A. van de Sande, O. de Rooij, B. Huurnink, J. C. van Gemert, J. R. R. Uijlings, J. He, X. Li, I. Everts, V. Nedovic, M. van Liempt, R. van Balen, F. Yan, M. A. Tahir, K. Mikolajczyk, J. Kittler, M. de Rijke, J.-M. Geusebroek, T. Gevers, M. Worring, A. W. M. Smeulders, D. C. Koelma, "The MediaMill TRECVID 2008 semantic video search engine," in Proceedings of the 6th TRECVID Workshop, Gaithersburg, USA, November 2008.

[31] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman, "The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results", [Online] Available: Temporary on-line reference link removed - see the PDF document

[32] O. Ludwig, D. Delgado, V. Goncalves, U. Nunes, "Trainable Classifier-Fusion Schemes: An Application To Pedestrian Detection," In IEEE International Conference On Intelligent Transportation Systems, vol. 1, pp. 432-437, St. Louis, USA, 4-7 October, 2009.
[CrossRef] [SCOPUS Times Cited 205]

[33] J. van DeWeijer, C. Schmid, J. Verbeek, D. Larlus, "Learning color names for real-world applications," in IEEE Transactions on Image Processing, ISSN 1057-7149, vol. 18(7), pp. 1512-1523, 2009.
[CrossRef] [Web of Science Times Cited 513] [SCOPUS Times Cited 658]

[34] M. A. Stricker, M. Orengo,"Similarity of color images," In Symposium on Electronic Imaging: Science and Technology, vol. 2420, pp. 381-392, 1995.

[35] T. Ojala, M. Pietikinen, D. Harwood, "Performance evaluation of texture measures with classification based on Kullback discrimination of distributions," In International Conference on Pattern Recognition (IAPR), vol. 1, pp. 582 - 585, Jerusalem, Israel, 09-13 Oct 1994.
[CrossRef]

[36] J. R. R. Uijlings, A. W. M. Smeulders, R. J. H. Scha, "Real-Time Visual Concept Classification," In IEEE Transactions on Multimedia, ISSN: 1520-9210, vol. 12(17), pp. 665-681, 2010.
[CrossRef] [Web of Science Times Cited 841] [SCOPUS Times Cited 1153]

[37] K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, "When Is Nearest Neighbor Meaningful?", Database Theory ICDT Lecture Notes in Computer Science, ISSN 0302-9743, vol. 1540, pp. 217-235, Jerusalem, Israel, 10-12 January, 1999.
[CrossRef]

[38] L. Breiman, "Random forests," In Journal of Machine Learning, 45(1),2009.
[CrossRef] [Web of Science Times Cited 26676] [SCOPUS Times Cited 88075]

[39] N. Lu, J. Wang, L. Yang, Q. H. Wu, Motion Detection Based On Accumulative Optical Flow and Double Background Filtering, in World Congress on Engineering, pp. 602-607, 2007.
[CrossRef] [SCOPUS Times Cited 4]

[40] A. Bovik, The essential guide to video processing, Elsevier Inc, 2009, ISBN: 0123744563.

[41] G. Hripcsak, A. Rothschild, Agreement, the f-measure, and reliability in information retrieval, in Journal of the American Medical Informatics Association, vol. 12(3), pp. 296-298, 2005.
[CrossRef] [SCOPUS Times Cited 11]

References Weight

Web of Science® Citations for all references: 76,714 TCR
SCOPUS® Citations for all references: 158,200 TCR

Web of Science® Average Citations per reference: 1,827 ACR
SCOPUS® Average Citations per reference: 3,767 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-10-19 21:53 in 221 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

A Fisher Kernel Approach for Multiple Instance Based Object Retrieval in Video Surveillance