1/2011 - 13 |
Domain Independent Vocabulary Generation and Its Use in Category-based Small Footprint Language ModelKIM, K.-H. , KIM, J.-H. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (639 KB) | Citation | Downloads: 1,668 | Views: 4,700 |
Author keywords
natural language processing, speech recognition
References keywords
language(16), speech(12), spoken(5), recognition(5), processing(5), modeling(5), model(5), vocabulary(4), statistical(4), gram(4)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2011-02-27
Volume 11, Issue 1, Year 2011, On page(s): 77 - 84
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2011.01013
Web of Science Accession Number: 000288761800013
SCOPUS ID: 79955973325
Abstract
The work in this paper pertains to domain independent vocabulary generation and its use in category-based small footprint Language Model (LM). Two major constraints of the conventional LMs in the embedded environment are memory capacity limitation and data sparsity for the domain-specific application. This data sparsity adversely affects vocabulary coverage and LM performance. To overcome these constraints, we define a set of domain independent categories using a Part-Of-Speech (POS) tagged corpus. Also, we generate a domain independent vocabulary based on this set using the corpus and knowledge base. Then, we propose a mathematical framework for a category-based LM using this set. In this LM, one word can be assigned assign multiple categories. In order to reduce its memory requirements, we propose a tree-based data structure. In addition, we determine the history length of a category n-gram, and the independent assumption applying to a category history generation. The proposed vocabulary generation method illustrates at least 13.68% relative improvement in coverage for a SMS text corpus, where data are sparse due to the difficulties in data collection. The proposed category-based LM requires only 215KB which is 55% and 13% compared to the conventional category-based LM and the word-based LM, respectively. It successively improves the performance, achieving 54.9% and 60.6% perplexity reduction compared to the conventional category-based LM and the word-based LM in terms of normalized perplexity. |
References | | | Cited By |
Web of Science® Times Cited: 0
View record in Web of Science® [View]
View Related Records® [View]
Updated 2 days, 14 hours ago
SCOPUS® Times Cited: 1
View record in SCOPUS® [Free preview]
View citations in SCOPUS® [Free preview]
[1] Fast Decision Tree Algorithm, PURDILA, V., PENTIUC, S.-G., Advances in Electrical and Computer Engineering, ISSN 1582-7445, Issue 1, Volume 14, 2014.
Digital Object Identifier: 10.4316/AECE.2014.01010 [CrossRef] [Full text]
Disclaimer: All information displayed above was retrieved by using remote connections to respective databases. For the best user experience, we update all data by using background processes, and use caches in order to reduce the load on the servers we retrieve the information from. As we have no control on the availability of the database servers and sometimes the Internet connectivity may be affected, we do not guarantee the information is correct or complete. For the most accurate data, please always consult the database sites directly. Some external links require authentication or an institutional subscription.
Web of Science® is a registered trademark of Clarivate Analytics, Scopus® is a registered trademark of Elsevier B.V., other product names, company names, brand names, trademarks and logos are the property of their respective owners.
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.