4/2017 - 10 |
K-Linkage: A New Agglomerative Approach for Hierarchical ClusteringYILDIRIM, P. , BIRANT, D. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (1,497 KB) | Citation | Downloads: 1,854 | Views: 4,241 |
Author keywords
clustering, data mining, data processing, knowledge discovery, unsupervised learning
References keywords
clustering(33), hierarchical(31), applications(11), systems(9), agglomerative(8), fast(7), data(7), algorithm(7), linkage(6), jeswa(6)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2017-11-30
Volume 17, Issue 4, Year 2017, On page(s): 77 - 88
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2017.04010
Web of Science Accession Number: 000417674300010
SCOPUS ID: 85035794377
Abstract
In agglomerative hierarchical clustering, the traditional approaches of computing cluster distances are single, complete, average and centroid linkages. However, single-link and complete-link approaches cannot always reflect the true underlying relationship between clusters, because they only consider just a single pair between two clusters. This situation may promote the formation of spurious clusters. To overcome the problem, this paper proposes a novel approach, named k-Linkage, which calculates the distance by considering k observations from two clusters separately. This article also introduces two novel concepts: k-min linkage (the average of k closest pairs) and k-max linkage (the average of k farthest pairs). In the experimental studies, the improved hierarchical clustering algorithm based on k-Linkage was executed on five well-known benchmark datasets with varying k values to demonstrate its efficiency. The results show that the proposed k-Linkage method can often produce clusters with better accuracy, compared to the single, complete, average and centroid linkages. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] H. Yoon, S. Park, "Determining the structural parameters that affect overall properties of warp knitted fabrics using cluster analysis," Textile Research Journal, vol. 72, no. 11, pp. 1013-1022, 2002. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 10] [2] P. Prada, A. Curran, K. Furton, "Characteristic human scent compounds trapped on natural and synthetic fabrics as analyzed by SPME-GC/MS," Journal of Forensic Science & Criminology, vol. 1, no. 1, pp. 1-10, 2014. [CrossRef] [3] Y. Loewenstein, E. Portugaly, M. Fromer, M. Linial, "Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space," Bioinformatics, vol. 24, no. 13, pp. i41-i49, 2008. [CrossRef] [Web of Science Times Cited 94] [SCOPUS Times Cited 104] [4] D. Wei, Q. Jiang, Y. Wei, S. Wang, "A novel hierarchical clustering algorithm for gene sequences," BMC Bioinformatics, vol. 13, no. 174, pp. 1-15, 2012. [CrossRef] [Web of Science Times Cited 70] [SCOPUS Times Cited 83] [5] Y. Bang, C. Lee, "Fuzzy time series prediction using hierarchical clustering algorithms," Expert Systems with Applications, vol. 38, no. 4, pp. 4312-4325, 2011. [CrossRef] [Web of Science Times Cited 30] [SCOPUS Times Cited 40] [6] H. Gao, J. Jiang, L. She, Y. Fu, "A new agglomerative hierarchical clustering algorithm implementation based on the Map Reduce framework," International Journal of Digital Content Technology and its Applications, vol. 4, no. 3, pp. 95-100, 2010. [CrossRef] [SCOPUS Times Cited 29] [7] S. Horng, M. Su, Y. Chen, T. Kao, R. Chen, J. Lai, C. Perkasa, "A novel intrusion detection system based on hierarchical clustering and support vector machines," Expert Systems with Applications, vol. 38, no. 1, pp. 306-313, 2011. [CrossRef] [Web of Science Times Cited 270] [SCOPUS Times Cited 385] [8] J. Almeida, L. Barbosa, A. Pais, S. Formosinho, "Improving hierarchical cluster analysis: A new method with outlier detection and automatic clustering," Chemometrics and Intelligent Laboratory Systems, vol. 87, no. 2, pp. 208-217, 2007. [CrossRef] [Web of Science Times Cited 134] [SCOPUS Times Cited 160] [9] S. Deininger, M. Ebert, A. Fu¨tterer, M. Gerhard, C. Ro¨cken, "MALDI imaging combined with hierarchical clustering as a new tool for the interpretation of complex human cancers," Journal of Proteome Research, vol. 7, no. 12, pp. 5230-5236, 2008. [CrossRef] [Web of Science Times Cited 197] [SCOPUS Times Cited 209] [10] A. Shalom, M. Dash, "Efficient partitioning based hierarchical agglomerative clustering using graphics accelerators with Cuda," International Journal of Artificial Intelligence & Applications, vol. 4, no. 2, pp. 13-33, 2013. [CrossRef] [11] H. A. Dalbouh, N. M. Norwawi, "Bidirectional agglomerative hierarchical clustering using AVL tree algorithm," International Journal of Computer Science Issues, vol. 8, no. 5, pp. 95-102, 2011. [12] E. Althaus, A. Hildebrandt, A. K. Hildebrandt, "A Greedy algorithm for hierarchical complete linkage clustering," in International Conference on Algorithms for Computational Biology, Tarragona, 2014, pp. 25-34. [CrossRef] [SCOPUS Times Cited 5] [13] A. Mamun, R. Aseltine, S. Rajasekaran, "Efficient record linkage algorithms using complete linkage clustering," PLOS ONE, vol. 11, no. 4, pp. 1-21, 2016. [CrossRef] [Web of Science Times Cited 15] [SCOPUS Times Cited 20] [14] O. Yim, K. Ramdeen, "Hierarchical Cluster Analysis: Comparison of three linkage measures and application to psychological data," The Quantitative Methods for Psychology, vol. 11, no. 1, pp. 8-21, 2015. [CrossRef] [Web of Science Times Cited 350] [15] Y. Li, L. R. Liang, " Hierarchical clustering of features on categorical data of biomedical applications," in Proceedings of the ISCA 21st International Conference on Computer Applications in Industry and Engineering, Hawaii, 2008. [16] E. Nasibov, C. Kandemir-Cavas, "OWA-based linkage method in hierarchical clustering: Application on phylogenetic trees," Expert Systems with Applications, vol. 38, no. 10, pp. 12684-12690, 2011. [CrossRef] [Web of Science Times Cited 21] [SCOPUS Times Cited 24] [17] S. Hirano, X. G. Sun, S. Tsumoto, "Comparison of clustering methods for clinical databases," Information Sciences, vol. 159, no. 3-4, pp. 155-165, 2004. [CrossRef] [Web of Science Times Cited 43] [SCOPUS Times Cited 57] [18] J. Bien, R. Tibshirani, "Hierarchical clustering with prototypes via minimax linkage," Journal of the American Statistical Association, vol. 106, no. 495, pp. 1075-1084, 2011. [CrossRef] [Web of Science Times Cited 96] [SCOPUS Times Cited 112] [19] M. Gagolewski, M. Bartoszuk, A. Cena, "Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm," Information Sciences, vol. 363, pp. 8-23, 2016. [CrossRef] [Web of Science Times Cited 54] [SCOPUS Times Cited 64] [20] S. Dasgupta, P. Long, "Performance guarantees for hierarchical clustering," Journal of Computer and System Sciences, vol. 70, no. 4, pp. 555-569, 2005. [CrossRef] [Web of Science Times Cited 116] [SCOPUS Times Cited 135] [21] J. Wu, H. Xiong, J. Chen, "Towards understanding hierarchical clustering: A data distribution perspective," Neurocomputing, vol. 72, no. 10-12, pp. 2319-2330, 2009. [CrossRef] [Web of Science Times Cited 37] [SCOPUS Times Cited 42] [22] A. Mirzaei, M. Rahmati, "A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations," IEEE Transactions on Fuzzy Systems, vol. 18, no. 1, pp. 27-39, 2010. [CrossRef] [Web of Science Times Cited 60] [SCOPUS Times Cited 70] [23] P. Contreras, F. Murtagh, "Fast, linear time hierarchical clustering using the Baire metric," Journal of Classification, vol. 29, no. 2, pp. 118-143, 2012. [CrossRef] [Web of Science Times Cited 13] [SCOPUS Times Cited 17] [24] A. Barirani, B. Agard, C. Beaudry, "Competence maps using agglomerative hierarchical clustering," Journal of Intelligent Manufacturing, vol. 24, no. 2, pp. 373-384, 2011. [CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 13] [25] H. Clifford, F. Wessely, S. Pendurthi, R. Emes, "Comparison of clustering methods for investigation of genome-wide methylation array data," Frontiers in Genetics, vol. 2, no. 88, pp. 1-11, 2011. [CrossRef] [SCOPUS Times Cited 29] [26] Y. M. Yacob, H. A. M. Sakim, N. A. M. Isa, "Decision tree-based feature ranking using Manhattan hierarchical cluster criterion," International Journal of Mathematical, Computational, Physical, Electrical and Computer Engineering, vol. 6, no. 2, pp. 765-771, 2012. [27] A. Bouguettaya, Q. Yu, X. Liu, X. Zhou, A. Song, "Efficient agglomerative hierarchical clustering," Expert Systems with Applications, vol. 42, no. 5, pp. 2785-2797, 2015. [CrossRef] [Web of Science Times Cited 273] [SCOPUS Times Cited 340] [28] M. Luczak, "Hierarchical clustering of time series data with parametric derivative dynamic time warping," Expert Systems with Applications, vol. 62, pp. 116-130, 2016. [CrossRef] [Web of Science Times Cited 55] [SCOPUS Times Cited 77] [29] D. Eppstein, "Fast hierarchical clustering and other applications of dynamic closest pairs," Journal of Experimental Algorithmics, vol. 5, p. 1-10, 2000. [CrossRef] [SCOPUS Times Cited 59] [30] Y. Lu, Y. Wan, "PHA: A fast potential-based hierarchical agglomerative clustering method," Pattern Recognition, vol. 46, no. 5, pp. 1227-1239, 2013. [CrossRef] [Web of Science Times Cited 48] [SCOPUS Times Cited 56] [31] D. Müllner, "fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python," Journal of Statistical Software, vol. 53, no. 9, 2013. [CrossRef] [SCOPUS Times Cited 462] [32] E. Masciari, G. M. Mazzeo, C. Zaniolo, "A new, fast and accurate algorithm for hierarchical clustering on Euclidean distances," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, 2013. [CrossRef] [SCOPUS Times Cited 11] [33] I. Davidson and S. S. Ravi, "Towards efficient and improved hierarchical clustering with instance and cluster level constraints", Technical Report, Department of Computer Science, University at Albany, 2005. [34] S. Bobdiya, K. Patidar, "An efficient ensemble based hierarchical clustering algorithm," International Journal of Emerging Technology and Advanced Engineering, vol. 4, no. 7, pp. 661-666, 2014. [35] L. Zheng, T. Li, C. Ding, "A framework for hierarchical ensemble clustering," Acm Transactions on Knowledge Discovery from Data, vol. 9, no. 2, 2014. [CrossRef] [Web of Science Times Cited 40] [SCOPUS Times Cited 28] [36] Z. Chen, S. Zhou, J. Luo, "A robust ant colony optimization for continuous functions," Expert Systems with Applications, vol. 81, pp. 309-320, 2017. [CrossRef] [Web of Science Times Cited 37] [SCOPUS Times Cited 50] [37] J. Vacák, "Adaptation of fuzzy cognitive maps by migration algorithms," Kybernetes, vol. 41, no. 3, pp. 429-443, 2012. [CrossRef] [Web of Science Times Cited 49] [SCOPUS Times Cited 76] [38] R. Precup, M. Sabau, E. M. Petriu, "Nature-inspired optimal tuning of input membership functions of Takagi-Sugeno-Kang fuzzy models for anti-lock braking systems," Applied Soft Computing, vol. 27, pp. 575-589, 2015. [CrossRef] [Web of Science Times Cited 86] [SCOPUS Times Cited 99] [39] S. Vrkalovic, T. Teban, I. Borlea, "Stable Takagi-Sugeno fuzzy control designed by optimization," International Journal of Artificial Intelligence, vol. 15, no. 2, pp. 17-29, 2017. [40] C. D. Manning, P. Raghavan, H. Schütze, "Hierarchical clustering", An Introduction to Information Retrieval, pp. 377-402, Cambridge University Press, 2012. [41] B. Walter, K. Bala, M. Kulkarni, K. Pingali, "Fast agglomerative clustering for rendering," in The IEEE Symposium on Interactive Ray Tracing, Los Angeles, 2008. Web of Science® Citations for all references: 2,204 TCR SCOPUS® Citations for all references: 2,866 TCR Web of Science® Average Citations per reference: 52 ACR SCOPUS® Average Citations per reference: 68 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-12-09 06:38 in 225 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.