Click to open the HelpDesk interface
AECE - Front page banner

Menu:


FACTS & FIGURES

JCR Impact Factor: 0.700
JCR 5-Year IF: 0.700
SCOPUS CiteScore: 1.8
Issues per year: 4
Current issue: Aug 2024
Next issue: Nov 2024
Avg review time: 59 days
Avg accept to publ: 60 days
APC: 300 EUR


PUBLISHER

Stefan cel Mare
University of Suceava
Faculty of Electrical Engineering and
Computer Science
13, Universitatii Street
Suceava - 720229
ROMANIA

Print ISSN: 1582-7445
Online ISSN: 1844-7600
WorldCat: 643243560
doi: 10.4316/AECE


TRAFFIC STATS

2,984,308 unique visits
1,157,859 downloads
Since November 1, 2009



Robots online now
SemrushBot
bingbot


SCOPUS CiteScore

SCOPUS CiteScore


SJR SCImago RANK

SCImago Journal & Country Rank




TEXT LINKS

Anycast DNS Hosting
MOST RECENT ISSUES

 Volume 24 (2024)
 
     »   Issue 3 / 2024
 
     »   Issue 2 / 2024
 
     »   Issue 1 / 2024
 
 
 Volume 23 (2023)
 
     »   Issue 4 / 2023
 
     »   Issue 3 / 2023
 
     »   Issue 2 / 2023
 
     »   Issue 1 / 2023
 
 
 Volume 22 (2022)
 
     »   Issue 4 / 2022
 
     »   Issue 3 / 2022
 
     »   Issue 2 / 2022
 
     »   Issue 1 / 2022
 
 
 Volume 21 (2021)
 
     »   Issue 4 / 2021
 
     »   Issue 3 / 2021
 
     »   Issue 2 / 2021
 
     »   Issue 1 / 2021
 
 
  View all issues  








LATEST NEWS

2024-Jun-20
Clarivate Analytics published the InCites Journal Citations Report for 2023. The InCites JCR Impact Factor of Advances in Electrical and Computer Engineering is 0.700 (0.700 without Journal self-cites), and the InCites JCR 5-Year Impact Factor is 0.600.

2023-Jun-28
Clarivate Analytics published the InCites Journal Citations Report for 2022. The InCites JCR Impact Factor of Advances in Electrical and Computer Engineering is 0.800 (0.700 without Journal self-cites), and the InCites JCR 5-Year Impact Factor is 1.000.

2023-Jun-05
SCOPUS published the CiteScore for 2022, computed by using an improved methodology, counting the citations received in 2019-2022 and dividing the sum by the number of papers published in the same time frame. The CiteScore of Advances in Electrical and Computer Engineering for 2022 is 2.0. For "General Computer Science" we rank #134/233 and for "Electrical and Electronic Engineering" we rank #478/738.

2022-Jun-28
Clarivate Analytics published the InCites Journal Citations Report for 2021. The InCites JCR Impact Factor of Advances in Electrical and Computer Engineering is 0.825 (0.722 without Journal self-cites), and the InCites JCR 5-Year Impact Factor is 0.752.

2022-Jun-16
SCOPUS published the CiteScore for 2021, computed by using an improved methodology, counting the citations received in 2018-2021 and dividing the sum by the number of papers published in the same time frame. The CiteScore of Advances in Electrical and Computer Engineering for 2021 is 2.5, the same as for 2020 but better than all our previous results.

Read More »


    
 

  1/2011 - 13

Domain Independent Vocabulary Generation and Its Use in Category-based Small Footprint Language Model

KIM, K.-H. See more information about KIM, K.-H. on SCOPUS See more information about KIM, K.-H. on IEEExplore See more information about KIM, K.-H. on Web of Science, KIM, J.-H. See more information about KIM, J.-H. on SCOPUS See more information about KIM, J.-H. on SCOPUS See more information about KIM, J.-H. on Web of Science
 
Extra paper information in View the paper record and citations in Google Scholar View the paper record and similar papers in Microsoft Bing View the paper record and similar papers in Semantic Scholar the AI-powered research tool
Click to see author's profile in See more information about the author on SCOPUS SCOPUS, See more information about the author on IEEE Xplore IEEE Xplore, See more information about the author on Web of Science Web of Science

Download PDF pdficon (639 KB) | Citation | Downloads: 1,667 | Views: 4,699

Author keywords
natural language processing, speech recognition

References keywords
language(16), speech(12), spoken(5), recognition(5), processing(5), modeling(5), model(5), vocabulary(4), statistical(4), gram(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2011-02-27
Volume 11, Issue 1, Year 2011, On page(s): 77 - 84
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2011.01013
Web of Science Accession Number: 000288761800013
SCOPUS ID: 79955973325

Abstract
Quick view
Full text preview
The work in this paper pertains to domain independent vocabulary generation and its use in category-based small footprint Language Model (LM). Two major constraints of the conventional LMs in the embedded environment are memory capacity limitation and data sparsity for the domain-specific application. This data sparsity adversely affects vocabulary coverage and LM performance. To overcome these constraints, we define a set of domain independent categories using a Part-Of-Speech (POS) tagged corpus. Also, we generate a domain independent vocabulary based on this set using the corpus and knowledge base. Then, we propose a mathematical framework for a category-based LM using this set. In this LM, one word can be assigned assign multiple categories. In order to reduce its memory requirements, we propose a tree-based data structure. In addition, we determine the history length of a category n-gram, and the independent assumption applying to a category history generation. The proposed vocabulary generation method illustrates at least 13.68% relative improvement in coverage for a SMS text corpus, where data are sparse due to the difficulties in data collection. The proposed category-based LM requires only 215KB which is 55% and 13% compared to the conventional category-based LM and the word-based LM, respectively. It successively improves the performance, achieving 54.9% and 60.6% perplexity reduction compared to the conventional category-based LM and the word-based LM in terms of normalized perplexity.


References | Cited By  «-- Click to see who has cited this paper

[1] S. Young, "A Reivew of Large Vocabulary Continuous Speech Recognition," IEEE Signal Processing Magazine, vol. 13, no. 5, pp. 45-57, 1990.
[CrossRef] [SCOPUS Times Cited 284]


[2] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book (for HTK version 3.2), Cambridge University Engineering Department, 2002.

[3] K. Lee, H. Hon, and R. Reddy, "An Overview of the SPHINX Speech Recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 1, pp. 35-45, 1990.
[CrossRef] [Web of Science Times Cited 165] [SCOPUS Times Cited 279]


[4] J. Novak, P. Dixon, and S. Furui, "An Empirical Comparison of the T3, Juicer, HDecode and Sphinx3 Decoders," Proc. Interspeech, pp.1890-1893, 2010.

[5] M. Adda-Decker and L. Lamel, "The Use of Lexica in Automatic Speech Recognition," Lexicon Development for Speech and Language Processing, F. van Eynde, D. Gibbon (Eds.), Kluwer Academic, pp. 235-266, 2000.

[6] R. Rosenfeld, "Optimizing Lexical and N-gram Coverage via Judicious Use of Linguistic Data," Proc. Eurospeech, pp. 1763-1766, 1995.

[7] S. Adolphs and N. Shemitt, "Lexical Coverage of Spoken Discourse," Applied Linguistics, vol. 24, no. 4, pp. 425-438, 2003.
[CrossRef] [Web of Science Times Cited 118] [SCOPUS Times Cited 147]


[8] P. Nation and R. Waring, "Vocabulary Size, Text Coverage and Word Lists," Vocabulary: Description, Acquisition and Pedagogy, N. Schmitt, M. McCarthy (Eds.), Cambridge University Press, pp. 6-19, 1997.

[9] S. Katz, "Estimation of Probabilities from Sparse Data for The Language Model Component of a Speech Recognizer," IEEE Transaction on Acoustic, Speech and Signal Processing., vol. 35, no. 3, pp. 400-401, 1987.
[CrossRef] [Web of Science Times Cited 595] [SCOPUS Times Cited 1070]


[10] F. Bechet, Y. Esteve, and R. Mori, "Tree-based Language Model Dedicated to Natural Spoken Dialog System," Proc. International Symposium on Computer Architecture, pp. 207-210, 2001.

[11] C. Troncoso and T. Kwawahara, "Trigger-based Language Model Adaptation for Automatic Meeting Transcription," Proc. Interspeech., pp. 1297-1300, 2005.

[12] I. Zitouni, K. Samili, and J. Haton, "Statistical Language Modeling Based on Variable-length Sequence," Computer Speech and Language, vol. 17, no. 1, pp. 27-41, 2003.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 13]


[13] I. Zitouni, "Backoff Hierarchical Class N-gram Language Models: Effectiveness to Model Unseen Events in Speech Recognition," Computer Speech and Language., vol. 21, no. 1, pp.88-104, 2007.
[CrossRef] [Web of Science Times Cited 19] [SCOPUS Times Cited 28]


[14] H. Yamamoto, S. Isogai, and Y. Sagisaka, "Multi-class Composite N-gram Language Model," Speech Communication., vol. 41, no. 2, pp 369-379, 2003.
[CrossRef] [Web of Science Times Cited 29] [SCOPUS Times Cited 41]


[15] P. Brown, V. Pietra, P. deSouza, J. Lai, and L. Mercer, "Class-based n-gram Models of Natural Language," Computational Linguistics., vol. 18, no. 4, pp. 467-479, 1990.

[16] R. Kneser and H. Ney, "Improved Clustering Techniques for Class-based Statistical Language Modeling," Proc. Eurospeech., pp.973-976, 1993.

[17] J. Stig, "Word Frequency and Text Type: Some Observations based on the LOB Corpus of British English texts," Computers and the Humanities, vol. 19, no. 1, pp. 23-36, 1985.
[CrossRef] [Web of Science Times Cited 10] [SCOPUS Times Cited 13]


[18] C. Fellbaum, WordNet: An Electronic Lexical Database, MIT Press, 1998.

[19] G. Leech, P. Rayson, and A. Wilson, Word Frequencies in Written and Spoken English: Based on the British National Corpus, Pearson ESL, 2001.

[20] R. Ordelman, A. Hessen, and F. Jong, "Lexicon Optimization for Dutch Speech Recognition in Spoken Document Retrieval," Proc. Eurospeech, pp. 1085-1088, 2001.

[21] J. Burger, J. Henderson, and W. Morgan, "Statistical Named Entity Recognizer Adaptation," Proc. Natural Language Learning, pp. 1-4, 2002.

[22] T. Hasegawa and S. Sekine, "Discovering Relations Among Named Entities from Large Corpora," Proc. Association for Computational Linguistics, pp. 415-442, 2004

[23] H. Blockeel, L. Raedt, and J. Ramon, "Top-down Induction of Clustering Trees," Proc. International Conference on Machine Learning, pp.55-63, 1998.

[24] P. Banerjee and H. Han, "Language Modeling Approaches to Information Retrieval," Journal of Computing Science and Engineering, vol. 3, no. 3, pp. 143-164, 2009.

[25] N. Schmitt and M. McCarthy, Vocabulary: Description, Acquisition and Pedagogy, Cambridge University Press, pp. 6-19, 1997.

[26] P. Clarkson and R. Rosenfeld, "Statistical Language Modeling Using the CMU-Cambridge Toolkit," Proc. Eurospeech., pp. 2707-2710, 1997.

[27] A. Stolcke, "SRILM-an Extensible Language Modeling Toolkit," Proc. International Conference on Spoken Language Processing, pp. 901-904, 2002.

References Weight

Web of Science® Citations for all references: 945 TCR
SCOPUS® Citations for all references: 1,875 TCR

Web of Science® Average Citations per reference: 35 ACR
SCOPUS® Average Citations per reference: 69 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-11-16 09:48 in 56 seconds.




Note1: Web of Science® is a registered trademark of Clarivate Analytics.
Note2: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania


All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.




Website loading speed and performance optimization powered by: 


DNS Made Easy