1/2014 - 10 |
Fast Decision Tree AlgorithmPURDILA, V. , PENTIUC, S.-G. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (607 KB) | Citation | Downloads: 1,215 | Views: 5,065 |
Author keywords
algorithm, chi-merge, classification, data compression, decision tree, pruning
References keywords
decision(10), tree(7), data(7), pruning(6), mining(6), trees(5)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2014-02-28
Volume 14, Issue 1, Year 2014, On page(s): 65 - 68
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2014.01010
Web of Science Accession Number: 000332062300010
SCOPUS ID: 84894631111
Abstract
There is a growing interest nowadays to process large amounts of data using the well-known decision-tree learning algorithms. Building a decision tree as fast as possible against a large dataset without substantial decrease in accuracy and using as little memory as possible is essential. In this paper we present an improved C4.5 algorithm that uses a compression mechanism to store the training and test data in memory. We also present a very fast tree pruning algorithm. Our experiments show that presented algorithms perform better than C5.0 in terms of speed and classification accuracy in most cases at the expense of tree size - the resulting trees are larger than the ones produced by C5.0. The data compression and pruning algorithms can be easily parallelized in order to achieve further speedup. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to data mining. Boston: Pearson Addison Wesley, 2005.
[2] Y. Freund and R. E. Schapire, "A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119-139, Aug. 1997. [CrossRef] [Web of Science Times Cited 11592] [SCOPUS Times Cited 12846] [3] S. Chakrabarti, Data mining: know it all. Burlington, MA: Elsevier/Morgan Kaufmann Publishers, 2009. [4] E. C. Vasconcellos, R. R. de Carvalho, R. R. Gal, F. L. LaBarbera, H. V. Capelato, H. F. C. Velho, M. Trevisan, and R. S. R. Ruiz, "Decision Tree Classifiers for Star/Galaxy Separation," The Astronomical Journal, vol. 141, no. 6, p. 189, Jun. 2011. [CrossRef] [Web of Science Times Cited 66] [SCOPUS Times Cited 75] [5] K.-H. Kim and J.-H. Kim, "Domain Independent Vocabulary Generation and Its Use in Category-based Small Footprint Language Model," Advances in Electrical and Computer Engineering, vol. 11, no. 1, pp. 77-84, 2011. [CrossRef] [Full Text] [Web of Science Record] [SCOPUS Times Cited 1] [6] J. R. Quinlan, "Induction of decision trees," Mach Learn, vol. 1, no. 1, pp. 81-106, Mar. 1986. [CrossRef] [SCOPUS Times Cited 15236] [7] S. L. Salzberg, "C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993," Mach Learn, vol. 16, no. 3, pp. 235-240, Sep. 1994. [CrossRef] [8] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, "Classification and Regression Trees (POD)," 1999. [9] M. Mehta, R. Agrawal, and J. Rissanen, "SLIQ: A Fast Scalable Classifier for Data Mining," in Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, London, UK, UK, 1996, pp. 18-32. [10] J. C. Shafer, R. Agrawal, and M. Mehta, "SPRINT: A Scalable Parallel Classifier for Data Mining," in Proceedings of the 22th International Conference on Very Large Data Bases, San Francisco, CA, USA, 1996, pp. 544-555. [11] W.-Y. Loh and Y.-S. Shih, Split Selection Methods for Classification Trees. 1997. [12] F. Provost and V. Kolluri, "A Survey of Methods for Scaling Up Inductive Algorithms," Data Mining and Knowledge Discovery, vol. 3, no. 2, pp. 131-169, Jun. 1999. [CrossRef] [Web of Science Times Cited 138] [SCOPUS Times Cited 192] [13] P. Huber, "From Large to Huge: A Statistician's Reactions to KDD & DM," 1997, p. 304. [14] R. Kerber, "ChiMerge: discretization of numeric attributes," in Proceedings of the tenth national conference on Artificial intelligence, San Jose, California, 1992, pp. 123-128. [15] J. Ouyang, N. Patel, and I. K. Sethi, "Chi-Square Test Based Decision Trees Induction in Distributed Environment," in IEEE International Conference on Data Mining Workshops, 2008. ICDMW '08, 2008, pp. 477-485. [16] J. R. Quinlan and R. L. Rivest, "Inferring decision trees using the minimum description length principle," Inf. Comput., vol. 80, no. 3, pp. 227-248, Mar. 1989. [CrossRef] [SCOPUS Times Cited 424] [17] D. Jensen and M. D. Schmill, "Adjusting for Multiple Comparisons in Decision Tree Pruning," in KDD, 1997, pp. 195-198. [18] M. Kearns and Y. Mansour, "A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization," in In Proceedings of the 15th International Conference on Machine Learning, 1998, pp. 269-277. [19] W. Zhang and Y. Li, "A Post-Pruning Decision Tree Algorithm Based on Bayesian," in 2013 Fifth International Conference on Computational and Information Sciences (ICCIS), 2013, pp. 988-991. [CrossRef] [SCOPUS Times Cited 6] [20] H. Guo, M. Fan, and Y. Ye, "Forest pruning based on Tree-Node Order," in 2011 IEEE International Conference on Computer Science and Automation Engineering (CSAE), 2011, vol. 3, pp. 71-76. [21] J. Chen, X. Wang, and J. Zhai, "Pruning Decision Tree Using Genetic Algorithms," in International Conference on Artificial Intelligence and Computational Intelligence, 2009. AICI '09, 2009, vol. 3, pp. 244-248. [22] W. N. H. W. Mohamed, M. N. M. Salleh, and A. H. Omar, "A comparative study of Reduced Error Pruning method in decision tree algorithms," in 2012 IEEE International Conference on Control System, Computing and Engineering (ICCSCE), 2012, pp. 392-397. [CrossRef] [Web of Science Times Cited 131] [SCOPUS Times Cited 191] Web of Science® Citations for all references: 11,927 TCR SCOPUS® Citations for all references: 28,971 TCR Web of Science® Average Citations per reference: 519 ACR SCOPUS® Average Citations per reference: 1,260 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-12-20 16:19 in 64 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.