3/2015 - 20 |
Evaluation of Subspace Clustering Using Internal Validity MeasuresOSZUST, M. , KOSTKA, M. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (1,280 KB) | Citation | Downloads: 814 | Views: 3,535 |
Author keywords
pattern recognition, data mining, subspace clustering, clustering validation, distance metrics
References keywords
clustering(19), data(13), information(9), subspace(8), algorithms(7), measures(6), machine(6), evaluation(6), systems(5), review(5)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2015-08-31
Volume 15, Issue 3, Year 2015, On page(s): 141 - 146
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2015.03020
Web of Science Accession Number: 000360171500020
SCOPUS ID: 84940728824
Abstract
Different clustering algorithms, or even the same algorithm with different input parameters, can produce different data partitioning. Then, clustering validity measures are applied in order to determine which results have better quality than others. External measures can be used for evaluation of clustering algorithms on datasets with known data division. However, in a real scenario such information is not available, and here internal measures are often applied. Subspace clustering techniques can create clusters which utilise different subsets of the full feature space. From this reason, a calculation of internal measures using the full feature space distance metrics (e.g., Euclidean distance) is not justified. In this paper, we propose a novel approach to subspace clustering evaluation with internal quality measures, i.e., we apply distance metrics that are able to handle missing attribute values or are used in dimensionality reduction techniques. Our approach is verified on eight publicly available, widely-used datasets. Obtained results are promising and allow recommending proposed distance metrics to be suitable for calculation of examined internal validation measures. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] S.-H. Liao, P.-H. Chu, and P.-Y. Hsiao, "Data mining techniques and applications - A decade review from 2000 to 2011," Expert Systems with Applications, vol. 39, no.12, pp. 11303-11311, 2012. [CrossRef] [Web of Science Times Cited 401] [SCOPUS Times Cited 565] [2] R. Xu and D. C. Wunsch II, Clustering, New York, NY, USA, Wiley/IEEE Press, 2009 [3] R. Xu and D. C. Wunsch II, "Clustering algorithms in biomedical research: a review," Biomedical Engineering, IEEE Reviews, vol. 3, pp. 120-154, 2010. [CrossRef] [SCOPUS Times Cited 261] [4] A. Nagpal, A. Jatain, and D. Gaur, "Review based on data clustering algorithms," Information & Communication Technologies (ICT), 2013 IEEE Conference on., pp. 298-303, April 2013. [CrossRef] [SCOPUS Times Cited 106] [5] C. C. Aggarwal and C. K. Reddy, Data clustering: algorithms and applications, CRC Press, 2013. [6] A. Patrikainen and M. Meila, "Comparing subspace clusterings," IEEE Transactions on Knowledge and Data Engineering, vol. 18:7, pp. 902-916, 2006. [CrossRef] [Web of Science Times Cited 96] [SCOPUS Times Cited 122] [7] H. P. Kriegel, P. Kroger, and A. Zimek, "Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering," ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 3:1, no. 1, 2009. [CrossRef] [Web of Science Times Cited 677] [SCOPUS Times Cited 900] [8] B. S. S. M. zu Eissen and F. Wisbrock, "On cluster validity and the information need of users," in Proc. 3rd Int. Conference on Artificial Intelligence and Applications (AIA 03), 2003. [9] L. Parsons, E. Haque, and H. Liu, "Subspace clustering for high dimensional data: a review," ACM SIGKDD Explorations Newsletter, vol. 6, no. 1, pp. 90-105, 2004. [CrossRef] [10] S. Günnemann, I. Färber, E. Müller, I. Assent, and T. Seidl, "External evaluation measures for subspace clustering," in Proceedings of the 20th ACM international conference on Information and knowledge management, ACM, pp. 1363-1372, 2011. [CrossRef] [SCOPUS Times Cited 43] [11] S. Ben-David and M. Ackerman, "Measures of clustering quality: A working set of axioms for clustering," in Proceedings of the Advances in Neural Information Processing Systems, pp. 121-128. 2008. [12] N. X. Vinh, J. Epps, and J. Bailey, "Information theoretic measures for clusterings comparison: is a correction for chance necessary?," in Proceedings of the 26th Annual International Conference on Machine Learning, ACM, pp. 1073-1080, 2009. [CrossRef] [SCOPUS Times Cited 187] [13] N. X. Vinh, J. Epps, and J. Bailey, "Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance," Journal of Machine Learning Research, vol. 11, pp. 2837-2854, 2010. [14] E. Muller, S. Gunnemann, I. Assent, and T. Seidl, "Evaluating clustering in subspace projections of high dimensional data," in Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 1270-128, 2009. [CrossRef] [SCOPUS Times Cited 203] [15] E. Bae and J. Bailey, "Enriched spatial comparison of clusterings through discovery of deviating subspaces," Machine Learning, vol. 98, no. 1-2, pp. 93-120, 2015. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 1] [16] M. Hassani, Y. Kim, S. Choi, and T. Seidl, "Subspace clustering of data streams: new algorithms and effective evaluation measures," Journal of Intelligent Information Systems, Springer US, pp. 1-17, 2014. [CrossRef] [Web of Science Times Cited 4] [SCOPUS Times Cited 8] [17] U. Markowska-Kaczmar and A. Hurej, "Evaluation of subspace clustering quality," Hybrid Artificial Intelligence Systems, Springer Berlin Heidelberg, pp. 400-407, 2008. [CrossRef] [SCOPUS Times Cited 1] [18] D. L. Davies and D. W. Bouldin, "A cluster separation measure," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 2, pp. 224-227, 1979. [CrossRef] [Web of Science Times Cited 5197] [SCOPUS Times Cited 6334] [19] C. L. Blake and C. J. Merz, "UCI Repository of machine learning databases http://archive.ics.uci.edu/ml/ ," Irvine, CA: University of California. Department of Information and Computer Science, 1998. [20] S. Gajawada, and D. Toshniwal, "Vinayaka: a semi-supervised projected clustering method using differential evolution," International Journal of Software Engineering and Applications (IJSEA), vol. 3, no. 4, pp. 77-85, 2012. [CrossRef] [21] P. Garcia-Laencina, J. Sancho-Gomez, and A. Figueiras-Vidal, "Pattern classification with missing data: a review," Neural Comput. Appl., vol. 19 no. 2, pp. 263-282. 2010. [CrossRef] [Web of Science Times Cited 480] [SCOPUS Times Cited 587] [22] C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. S. Park, "Fast algorithms for projected clustering," in ACM SIGMoD Record, vol. 28, no. 2, pp. 61-72, ACM,1999. [CrossRef] [SCOPUS Times Cited 35] [23] U. Maulik and S. Bandyopadhyay, "Performance evaluation of some clustering algorithms and validity indices," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no.12, pp.1650-1654, 2002. [CrossRef] [Web of Science Times Cited 959] [SCOPUS Times Cited 1155] [24] P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Computational and Applied Mathematics, vol. 20, pp. 53-65, 1987. [25] O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J.-M. Perez, and I. Perona, "An extensive comparative study of cluster validity indices," Pattern Recognition, vol. 46, no. 1, pp. 243-256, 2013. [CrossRef] [Web of Science Times Cited 805] [SCOPUS Times Cited 971] [26] G. E. A. P. A. Batista and M. C. Monard, "Experimental comparison of k-nearest neighbour and mean or mode imputation methods with the internal strategies used by C4.5 and CN2 to treat missing data," University of Sao Paulo, 2003. [27] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, "Dimensionality reduction for fast similarity search in large time series databases," Knowledge and information Systems, vol. 3, no. 3, pp. 263-286, 2001. [CrossRef] [28] E. Achtert, H.-P. Kriegel, and A. Zimek, "ELKI: a software system for evaluation of subspace clustering algorithms," in Proceedings of the 20th international conference on Scientific and Statistical Database Management, SSDBM '08, pp. 580-585. Springer Berlin / Heidelberg, 2008. [CrossRef] [SCOPUS Times Cited 70] [29] A. Hein and T. Kirste, "Unsupervised detection of motion primitives in very high dimensional sensor data," in Proceedings of the 5th Workshop on Behaviour Monitoring and Interpretation, BMI'10, Karlsruhe, Germany, 2010. [30] D. Ingaramo, D. Pinto, P. Rosso, and M. Errecalde, "Evaluation of internal validity measures in short-text corpora," in Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, pp. 555-567, 2008. [CrossRef] [SCOPUS Times Cited 29] [31] J. Handl, J. Knowles, and D.-B. Kell, "Computational cluster validation in post-genomic data analysis," Bioinformatics, vol. 21, no. 15, pp. 3201-3212, 2005. [CrossRef] [Web of Science Times Cited 634] [SCOPUS Times Cited 720] Web of Science® Citations for all references: 9,254 TCR SCOPUS® Citations for all references: 12,298 TCR Web of Science® Average Citations per reference: 289 ACR SCOPUS® Average Citations per reference: 384 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-11-19 13:29 in 154 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.