1/2017 - 4 |
Comparison of Cepstral Normalization Techniques in Whispered Speech RecognitionGROZDIC, D. , JOVICIC, S. , SUMARAC PAVLOVIC, D. , GALIC, J. , MARKOVIC, B. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (1,179 KB) | Citation | Downloads: 1,222 | Views: 3,986 |
Author keywords
automatic speech recognition, cepstral analysis, hidden Markov models, speech analysis, whisper
References keywords
speech(26), recognition(13), whispered(12), hansen(6), whisper(5), signal(5), processing(5), jovicic(5), grozdic(4), boril(4)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2017-02-28
Volume 17, Issue 1, Year 2017, On page(s): 21 - 26
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2017.01004
Web of Science Accession Number: 000396335900004
SCOPUS ID: 85014204959
Abstract
This article presents an analysis of different cepstral normalization techniques in automatic recognition of whispered and bimodal speech (speech+whisper). In these experiments, conventional GMM-HMM speech recognizer was used as speaker-dependant automatic speech recognition system with special Whi-Spe corpus containing utterance recordings in normally phonated speech and whisper. The following normalization techniques were tested and compared: CMN (Cepstral Mean Normalization), CVN (Cepstral Variance Normalization), MVN (Cepstral Mean and Variance Normalization), CGN (Cepstral Gain Normalization) and quantile-based dynamic normalization techniques such as QCN and QCN-RASTA. The experimental results show to what extent each of these cepstral normalization techniques can improve whisper recognition accuracy in mismatched train/test scenario. The best result is obtained using CMN in combination with inverse filtering which provides an average 39.9 percent improvement in whisper recognition accuracy for all tested speakers. |
References | | | Cited By |
Web of Science® Times Cited: 5 [View]
View record in Web of Science® [View]
View Related Records® [View]
Updated today
SCOPUS® Times Cited: 9
View record in SCOPUS® [Free preview]
View citations in SCOPUS® [Free preview]
[1] Performance Evaluation of Normalization Techniques in Adverse Conditions, Singh, Renu, Bhattacharjee, Utpal, Singh, Arvind Kumar, Procedia Computer Science, ISSN 1877-0509, Issue , 2020.
Digital Object Identifier: 10.1016/j.procs.2020.04.169 [CrossRef]
[2] Electromagnetic Wave Pattern Detection with Multiple Sensors in the Manufacturing Field, OHNISHI, Ayano, MIYAMOTO, Michio, TAKEUCHI, Yoshio, MAEYAMA, Toshiyuki, HASEGAWA, Akio, YOKOYAMA, Hiroyuki, IEICE Transactions on Communications, ISSN 0916-8516, Issue 2, Volume E106.B, 2023.
Digital Object Identifier: 10.1587/transcom.2022CEP0005 [CrossRef]
[3] Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering, Galić, Jovan, Marković, Branko, Grozdić, Đorđe, Popović, Branislav, Šajić, Slavko, Applied Sciences, ISSN 2076-3417, Issue 18, Volume 14, 2024.
Digital Object Identifier: 10.3390/app14188223 [CrossRef]
[4] Research of Window Function Influence on the Result of Arabic Speech Automatic Recognition, Levin, Evgenii, Al-Dhaibani, Abdulghani, 2019 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), ISBN 978-1-5386-8364-4, 2019.
Digital Object Identifier: 10.1109/USBEREIT.2019.8736574 [CrossRef]
[5] An overview of Automatic Speech Recognition Preprocessing Techniques, Labied, Maria, Belangour, Abdessamad, Banane, Mouad, Erraissi, Allae, 2022 International Conference on Decision Aid Sciences and Applications (DASA), ISBN 978-1-6654-9501-1, 2022.
Digital Object Identifier: 10.1109/DASA54658.2022.9765043 [CrossRef]
Disclaimer: All information displayed above was retrieved by using remote connections to respective databases. For the best user experience, we update all data by using background processes, and use caches in order to reduce the load on the servers we retrieve the information from. As we have no control on the availability of the database servers and sometimes the Internet connectivity may be affected, we do not guarantee the information is correct or complete. For the most accurate data, please always consult the database sites directly. Some external links require authentication or an institutional subscription.
Web of Science® is a registered trademark of Clarivate Analytics, Scopus® is a registered trademark of Elsevier B.V., other product names, company names, brand names, trademarks and logos are the property of their respective owners.
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.