Enhancing ASR Systems for Under-Resourced Languages through a Novel Unsupervised Acoustic Model Training Technique

doi:10.4316/AECE.2015.01009

1/2015 - 9

View TOC | « Previous Article | Next Article »

HIGHLY CITED PAPER

Enhancing ASR Systems for Under-Resourced Languages through a Novel Unsupervised Acoustic Model Training Technique

CUCU, H. , BUZO, A. , BESACIER, L. , BURILEANU, C.

Extra paper information in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (612 KB) | Citation | Downloads: 1,135 | Views: 4,915

Author keywords
speech recognition, under-resourced languages, unsupervised acoustic modeling, unsupervised training

References keywords
speech(15), training(13), unsupervised(12), resourced(5), recognition(5), processing(5), languages(5), language(5), acoustic(5), system(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2015-02-28
Volume 15, Issue 1, Year 2015, On page(s): 63 - 68
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2015.01009
Web of Science Accession Number: 000352158600009
SCOPUS ID: 84924787729

Abstract

Full text preview

Statistical speech and language processing techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition. For high-resourced, international languages this data is widely available, while for under-resourced languages the lack of data poses serious problems. Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic model for any under-resourced language. This study describes a novel unsupervised acoustic model training method and evaluates it on speech data in an under-resourced language: Romanian. The key novel factor of the method is the usage of two complementary seed ASR systems to produce high quality transcriptions, with a Character Error Rate (ChER) < 5%, for initially untranscribed speech data. The methodology leads to a relative Word Error Rate (WER) improvement of more than 10% when 100 hours of untranscribed speech are used.

References

Cited By «-- Click to see who has cited this paper

[1] L. Besacier, E. Barnard, A. Karpov, T. Schultz, "Automatic speech recognition for under-resourced languages: A survey.", in Speech Communication, Vol. 56 - Special Issue on Processing Under-Resourced Languages, pp. 85-100,
[CrossRef] [SCOPUS Times Cited 396]

[2] H. Cucu, "Towards a speaker-independent, large-vocabulary continuous speech recognition system for Romanian", PhD Thesis, University "Politehnica" of Bucharest, 2011.

[3] A. Buzo, H. Cucu, C. Burileanu, "Text Spotting In Large Speech Databases For Under-Resourced Languages", in Proc. Int. Conf. Speech Technology and Human-Computer Dialogue (SpeD), Cluj-Napoca, Romania, 2013, pp. 77-82,
[CrossRef] [SCOPUS Times Cited 6]

[4] H. Cucu, A. Buzo, C. Burileanu, "Unsupervised Acoustic Model Training using Multiple Seed ASR Systems", in Proc. Int. Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), St. Petersburg, Russia, 2014, pp. 124-130.

[5] G. Zavaliagkos, T. Colthurst, "Utilizing Untranscribed Training Data to Improve Performance", in DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, USA, 1998, pp. 301-305

[6] T. Kemp and A. Waibel, "Unsupervised Training of a Speech Recognizer: Recent Experiments", in Proc. Eurospeech, Budapest, Hungary, 1999, pp. 2725-2728.

[7] F. Wessel and H. Ney, "Unsupervised training of acoustic models for large vocabulary continuous speech recognition", in Proc. Automatic Speech Recognition and Understanding Workshop (ASRU), Trento, Italy, 2001, pp. 307-310,
[CrossRef] [SCOPUS Times Cited 99]

[8] L. Lamel, J.-L. Gauvain, G. Adda, "Lightly Supervised and Unsupervised Acoustic Model Training", in Computer Speech & Language, vol. 16, pp. 115-129, 2002. Available:
[CrossRef] [SCOPUS Times Cited 236]

[9] T. Fraga-Silva, J.-L. Gauvain, L. Lamel, "Lattice-based Unsupervised Acoustic Model Training", in Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 4656-4659,
[CrossRef] [SCOPUS Times Cited 28]

[10] L. Wang, M. J. F. Gales and P. C. Woodland, "Unsupervised training for mandarin broadcast news and conversational transcription", in Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Honolulu, Hawaii, 2007, vol. IV, pp. 353-356,
[CrossRef] [SCOPUS Times Cited 37]

[11] J. Ma, S. Matsoukas., "Unsupervised training on a large amount of Arabic news broadcast data", in Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Hawaii, 2007, vol. II, pp. 349-352,
[CrossRef] [SCOPUS Times Cited 13]

[12] K. Yu, M. J. F. Gales, L. Wang and P. C. Woodland, "Unsupervised training and directed manual transcription for LVCSR", in Speech Communication, Vol. 52, pp. 652-663, 2010. Available:
[CrossRef] [SCOPUS Times Cited 67]

[13] J. Loof, C. Gollan, and H. Ney, "Cross-language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System", in Proc. INTERSPEECH, Brighton, U.K., 2009, pp. 88-91.

[14] N. T. Vu, F. Kraus and T. Schultz, "Cross-language bootstrapping based on completely unsupervised training using multilingual A-stabil", in Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 5000-5003,
[CrossRef] [SCOPUS Times Cited 32]

[15] N. T. Vu, F. Kraus and T. Schultz, "Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised Training", In Proc. INTERSPEECH, Florence, Italy, 2011, pp. 3145-3148.

[16] N. T. Vu, F. Kraus and T. Schultz, "Multilingual A-stabil: A new confidence score for multilingual unsupervised training", in Spoken Language Technology Workshop (SLT), Berkeley, California, USA, 2010, pp. 183-188,
[CrossRef] [SCOPUS Times Cited 24]

[17] H. Cucu, A. Buzo, L. Petrica, D. Burileanu and C. Burileanu, "Recent Improvements of the SpeeD Romanian LVCSR System", in Proc. Int. Conf. on Communications (COMM), Bucharest, Romania, 2014, pp. 111-114,
[CrossRef] [SCOPUS Times Cited 15]

[18] CMU Sphinx Toolkit: [Online] Available: Temporary on-line reference link removed - see the PDF document

[19] SRI-LM Toolkit: [Online] Available: Temporary on-line reference link removed - see the PDF document

[20] M. Rouvier, G. Dupuy, P. Gay, E. Khoury, T. Merlin, S. Meignier, "An Open-source State-of-the-art Toolbox for Broadcast News Diarization," in Proc. INTERSPEECH, Lyon, France, 2013.

References Weight

Web of Science® Citations for all references: 0
SCOPUS® Citations for all references: 953 TCR

Web of Science® Average Citations per reference: 0
SCOPUS® Average Citations per reference: 45 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2025-07-01 03:02 in 74 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2025
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Enhancing ASR Systems for Under-Resourced Languages through a Novel Unsupervised Acoustic Model Training Technique