1/2019 - 11 |
Generic Feature Selection Methodology to Named Entity Detection from Indian and European LanguagesMALARKODI, C. S. , DEVI, S. L. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (1,279 KB) | Citation | Downloads: 1,289 | Views: 3,204 |
Author keywords
classification, optimization, feature extraction, fuzzy logic, signal processing
References keywords
named(30), entity(28), recognition(23), language(13), languages(10), indian(9), india(8), sobha(6), natural(6), learning(6)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2019-02-28
Volume 19, Issue 1, Year 2019, On page(s): 79 - 88
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2019.01011
Web of Science Accession Number: 000459986900011
SCOPUS ID: 85064208532
Abstract
This paper describes the development of language and domain independent Named Entity Recognition (NER) system which can identify named entities from any given dataset irrespective of the language and domain. The main novelty of the present work is the generic feature selection methodology which has been applied to 7 Indian languages and 5 European languages. The generic feature selection methodology was done in two ways; first using frequency based approach; secondly k-means++ clustering algorithm was used to validate the patterns obtained in the frequency based approach. The dataset used for the experiments belongs to different genre. To the best of our knowledge we are the first to work on the development of cross-lingual Named Entity (NE) system with 12 languages belongs to different language families. We have done the 10-fold cross validation and the system output has been analyzed for all the languages and causes of error cases was discussed in the error analysis section. The performance of our system is also compared with the existing systems. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] A. Borthwick, J. Sterling, E. Agichtein, R. Grishman, "NYU: Description of the MENE named Entity System," in Proc. Seventh Machine Understanding Conference (MUC-7), Virginia, 1998.
[2] D. Nadeau, S. Sekine, "A survey of named entity recognition and classification," Linguisticae Investigationes, vol. 30, no. 7, pp. 3-26, 2007. [CrossRef] [SCOPUS Times Cited 1965] [3] D. M. Bikel, S. Miller, R. Schwartz, R. Weischedel, "Nymble: A high-performance learning name-finder," in Proc. Fifth Conference on Applied Natural Language Processing, Washington, 1997, pp. 194-201. [CrossRef] [4] E. F. Tjong Kim Sang, "Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition," in Proc. CONLL-2002, Taipei, Taiwan, 2002, [CrossRef] [5] E .F. Tjong Kim Sang, F. De Meulder, "Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition," in Proc. of the seventh conference on Natural language learning at HLT-NAACL 2003, Canada, vol. 4, 2003, pp. 142-147. Arxiv:cs/0306050 [6] R. Florian, A. Ittycheriah, H. Jing, T. Zhang, "Named entity recognition through classifier combination," in Proc. Seventh conference on Natural language learning at HLT-NAACL 2003, ACM, vol. 4, pp. 168-171, 2003. [CrossRef] [7] F. De Meulder, W. Daelemans, "Memory-based named entity recognition using unannotated data," in Proc. Seventh conference on Natural language learning at HLT-NAACL 2003, ACL, vol. 4, 2003, pp. 208-211. [CrossRef] [8] B. Desmet, V. Hoste, "Dutch named entity recognition using classifier ensembles," LOT Occasional Series, vol. 16, pp. 29-41, 2010. [9] D. Varga, E. Simon "Hungarian named entity recognition with a maximum entropy approach," Acta Cybern, vol. 18, no. 2, pp. 293-301, 2007. [10] G. Szarvas, R. Farkas, A. Kocsor, "A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms," in Proc. International Conference on Discovery Science, pp. 267-278, 2006. [CrossRef] [SCOPUS Times Cited 69] [11] R. Florian, "Named entity recognition as a house of cards: Classifier stacking," in Proc. of the 6th conference on Natural language learning, Association for Computational Linguistics, vol. 20, pp. 1-4, 2002. [CrossRef] [12] D. Benikova, C. Biemann, M. Kisselew, S. Padó, "Germeval 2014 named entity recognition shared task: companion paper," in Proc. KONVENS GermEval Shared Task on Named Entity Recognition, Hildesheim, Germany, 2014, pp. 104-112. [13] A. K. Singh, "Named Entity Recognition for South and South East Asian Languages: Taking Stock", in Proc. IJCNLP, India, 2008, pp. 5-16. [14] S. K. Saha, P. Sarathi Ghosh, S. Sarkar, P. Mitra, "Named entity recognition in Hindi using maximum entropy and transliteration," Polibits, vol. 38, pp. 33-41, 2008. [CrossRef] [15] S. Gupta, P. Bhattacharyya, "Think globally, apply locally: using distributional characteristics for Hindi named entity identification," in Proc. Named Entities Workshop, 2010, pp. 116-125. ISBN: 978-1-932432-78-7 [16] N.V. Patil, A. S. Patil, B. V. Pawar, "Issues and Challenges in Marathi Named Entity Recognition," International Journal on Natural Language Computing (IJNLC), vol. 5, no. 1, pp. 15-30, 2016. [CrossRef] [17] A. Kaur, G. S. Josan, "Evaluation of Named Entity Features for Punjabi Language," Procedia Computer Science," vol. 1, no. 46, pp. 159-166, 2015. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 10] [18] A. Ekbal, S. Bandyopadhyay, "A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi," Linguistic Issues in Language Technology, vol. 2, no. 1, pp. 1-44, 2009. [19] K. N. Kumar, G. S. K. Santosh, V. Varma, "A Language-Independent Approach to Identify the Named Entities in under-resourced languages and Clustering Multilingual Documents," in Proc. International Conference on Multilingual and Multimodal Information Access Evaluation, Amsterdam, 2011, pp. 74-82. [20] M. S. Bindu, I. Sumam Mary, "Design And Development Of A Named Entity Based Question Answering System For Malayalam Language," PhD diss., Cochin University Of Science And Technology, 2012. [21] G. V. S. Raju, B. Srinivasu, S. V. Raju, K. S. M. V. Kumar, Named Entity Recognition for Telugu using Maximum Entropy Model. Journal of Theoretical & Applied Information Technology, vol. 1, no. 13, 2010. [22] S. L. Pandian, T. V. Geetha, Krishna, "Named Entity Recognition in Tamil using Context-cues and the E-M algorithm," in Proc. 3rd Indian International Conference on Artificial Intelligence, Pune, India, pp. 1951-1958, 2007. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 5] [23] R. Vijayakrishna, L. D. Sobha, "Domain focused Named Entity for Tamil using Conditional Random Fields," in Proc. workshop on NER for South and South East Asian Languages, Hyderabad, India, 2008, pp. 59-66. [24] C. S. Malarkodi, L. D. Sobha, "A Deeper Look into Features for NE Resolution in Indian Languages," in Proc. of the Workshop on Indian Language Data: Resources and Evaluation, LREC, Istanbul, 2012, pp. 36-41. [25] C. S. Malarkodi, R. K. Pattabhi, L. D. Sobha, "Tamil NER-Coping with Real Time Challenges," in Proc. workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012), COLING, Bombay, India, 2012, pp. 23-38. [26] L. D. Sobha, C.S. Malarkodi, K. Marimuthu, "Named Entity Recognizer for Indian Languages," in Proc. ICON NLP Tool Contest, India, 2013. [27] V. Gayen, K. Sarkar, "An HMM based named entity recognition system for Indian languages: the JU system at ICON 2013," in Proc. of the ICON NLP Tool Contest, 2014. arXiv:1405.7397v1 [28] R. K. Pattabhi, L.D. Sobha, "NERIL: Named Entity Recognition for Indian Languages @ FIRE 2013-An Overview," in Proc. FIRE-2013, India, 2013. [29] R. K. Pattabhi, L.D. Sobha, "NERIL: Named Entity Recognition for Indian Languages @ FIRE 2014-An Overview," in Proc. of the FIRE-2014, India, 2014. [30] N. Abinaya, J. Neethu, H.B.G. Barathi, M. K. Anand, K. P. Soman, "AMRITA_CEN@ FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features," in Proc. Forum for Information Retrieval Evaluation, India, ACM, 2014, pp. 103-111. [CrossRef] [SCOPUS Times Cited 28] [31] S. K. Saha, S. Sudeshna M. Pabitra, "Feature selection techniques for maximum entropy based biomedical named entity recognition," Journal of biomedical informatics, vol. 42, no. 5, pp. 905-911, 2009. [CrossRef] [Web of Science Times Cited 72] [SCOPUS Times Cited 101] [32] S. Zahra, M.A. Ghazanfar, A. Khalid, M.A. Azam, U. Naeem, & A. Prugel-Bennett, "Novel centroid selection approaches for KMeans-clustering based recommender systems," Information sciences, vol. 320, pp. 156-189, 2015. [CrossRef] [Web of Science Times Cited 150] [SCOPUS Times Cited 194] [33] T. Zhang, F. Ma, "Improved rough k-means clustering algorithm based on weighted distance measure with Gaussian function," International Journal of Computer Mathematics, vol. 94, no. 4, pp. 663-675, 2017. [CrossRef] [Web of Science Times Cited 41] [SCOPUS Times Cited 58] [34] I. D. Borlea, R. E. Precup, F. Dragan, A. B. Borlea, A. B. "Centroid update approach to K-means clustering," Advances in Electrical and Computer Engineering, vol. 17, no. 4, pp. 3-11, 2017. [CrossRef] [Full Text] [Web of Science Times Cited 25] [SCOPUS Times Cited 34] [35] Chakraborty, Saptarshi, D. Swagatam, "k- Means clustering with a new divergence-based distance metric: Convergence and performance analysis," Pattern Recognition Letters, vol. 100, pp. 67-73, 2017. [CrossRef] [Web of Science Times Cited 45] [SCOPUS Times Cited 53] [36] J. Lafferty, A. McCallum, F. Pereira, "Conditional Random Fields for segmenting and labelling sequence data," in Proc. ICML-01, Massachusetts, 2001, pp. 282-289. [37] H. M. Wallach, "Conditional random fields: An introduction," Technical Reports (CIS), MSCIS-04-21, 2004. Web of Science® Citations for all references: 341 TCR SCOPUS® Citations for all references: 2,517 TCR Web of Science® Average Citations per reference: 9 ACR SCOPUS® Average Citations per reference: 66 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-12-19 16:42 in 121 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.