Click to open the HelpDesk interface
AECE - Front page banner

Menu:


FACTS & FIGURES

JCR Impact Factor: 0.700
JCR 5-Year IF: 0.700
SCOPUS CiteScore: 1.8
Issues per year: 4
Current issue: Aug 2024
Next issue: Nov 2024
Avg review time: 58 days
Avg accept to publ: 60 days
APC: 300 EUR


PUBLISHER

Stefan cel Mare
University of Suceava
Faculty of Electrical Engineering and
Computer Science
13, Universitatii Street
Suceava - 720229
ROMANIA

Print ISSN: 1582-7445
Online ISSN: 1844-7600
WorldCat: 643243560
doi: 10.4316/AECE


TRAFFIC STATS

2,834,683 unique visits
1,124,334 downloads
Since November 1, 2009



Robots online now
Googlebot
DotBot
SiteExplorer
bingbot


SCOPUS CiteScore

SCOPUS CiteScore


SJR SCImago RANK

SCImago Journal & Country Rank




TEXT LINKS

Anycast DNS Hosting
MOST RECENT ISSUES

 Volume 24 (2024)
 
     »   Issue 3 / 2024
 
     »   Issue 2 / 2024
 
     »   Issue 1 / 2024
 
 
 Volume 23 (2023)
 
     »   Issue 4 / 2023
 
     »   Issue 3 / 2023
 
     »   Issue 2 / 2023
 
     »   Issue 1 / 2023
 
 
 Volume 22 (2022)
 
     »   Issue 4 / 2022
 
     »   Issue 3 / 2022
 
     »   Issue 2 / 2022
 
     »   Issue 1 / 2022
 
 
 Volume 21 (2021)
 
     »   Issue 4 / 2021
 
     »   Issue 3 / 2021
 
     »   Issue 2 / 2021
 
     »   Issue 1 / 2021
 
 
  View all issues  


FEATURED ARTICLE

Application of the Voltage Control Technique and MPPT of Stand-alone PV System with Storage, HIVZIEFENDIC, J., VUIC, L., LALE, S., SARIC, M.
Issue 1/2022

AbstractPlus






LATEST NEWS

2024-Jun-20
Clarivate Analytics published the InCites Journal Citations Report for 2023. The InCites JCR Impact Factor of Advances in Electrical and Computer Engineering is 0.700 (0.700 without Journal self-cites), and the InCites JCR 5-Year Impact Factor is 0.600.

2023-Jun-28
Clarivate Analytics published the InCites Journal Citations Report for 2022. The InCites JCR Impact Factor of Advances in Electrical and Computer Engineering is 0.800 (0.700 without Journal self-cites), and the InCites JCR 5-Year Impact Factor is 1.000.

2023-Jun-05
SCOPUS published the CiteScore for 2022, computed by using an improved methodology, counting the citations received in 2019-2022 and dividing the sum by the number of papers published in the same time frame. The CiteScore of Advances in Electrical and Computer Engineering for 2022 is 2.0. For "General Computer Science" we rank #134/233 and for "Electrical and Electronic Engineering" we rank #478/738.

2022-Jun-28
Clarivate Analytics published the InCites Journal Citations Report for 2021. The InCites JCR Impact Factor of Advances in Electrical and Computer Engineering is 0.825 (0.722 without Journal self-cites), and the InCites JCR 5-Year Impact Factor is 0.752.

2022-Jun-16
SCOPUS published the CiteScore for 2021, computed by using an improved methodology, counting the citations received in 2018-2021 and dividing the sum by the number of papers published in the same time frame. The CiteScore of Advances in Electrical and Computer Engineering for 2021 is 2.5, the same as for 2020 but better than all our previous results.

Read More »


    
 

  3/2023 - 1
View TOC | « Previous Article | Next Article »

Exploring the Impact of Data Augmentation Techniques on Automatic Speech Recognition System Development: A Comparative Study

GALIC, J. See more information about GALIC, J. on SCOPUS See more information about GALIC, J. on IEEExplore See more information about GALIC, J. on Web of Science, GROZDIC, D. See more information about GROZDIC, D. on SCOPUS See more information about GROZDIC, D. on SCOPUS See more information about GROZDIC, D. on Web of Science
 
View the paper record and citations in View the paper record and citations in Google Scholar
Click to see author's profile in See more information about the author on SCOPUS SCOPUS, See more information about the author on IEEE Xplore IEEE Xplore, See more information about the author on Web of Science Web of Science

Download PDF pdficon (1,307 KB) | Citation | Downloads: 1,025 | Views: 1,219

Author keywords
artificial neural networks, audio databases, automatic speech recognition, hidden markov models, support vector machines

References keywords
speech(22), recognition(15), data(13), augmentation(12), processing(7), audio(7), interspeech(6), signal(5), science(5), whispered(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2023-08-31
Volume 23, Issue 3, Year 2023, On page(s): 3 - 12
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2023.03001
Web of Science Accession Number: 001062641900001
SCOPUS ID: 85172345871

Abstract
Quick view
Full text preview
Automatic Speech Recognition (ASR) systems are notorious for their poor performance in adverse conditions, leading to high sensitivity and low robustness. Due to the costly and time-consuming nature of creating extensive speech databases, addressing the issue of low robustness has become a prominent area of research, focusing on the synthetic generation of speech data using pre-existing natural speech. This paper examines the impact of standard data augmentation techniques, including pitch shift, time stretch, volume control, and their combination, on the accuracy of isolated-word ASR systems. The performance of three machine learning models, namely Hidden Markov Models (HMM), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN), is analyzed on two Serbian corpora of isolated words. The Whi-Spe speech database in neutral phonation is utilized for augmentation and training, and a specifically developed Python-based software tool is employed for the augmentation process in this research study. The conducted experiments demonstrate a statistically significant reduction in the Word Error Rate (WER) for the CNN-based recognizer on both testing datasets, achieved through a single augmentation technique based on pitch-shifting.


References | Cited By  «-- Click to see who has cited this paper

[1] D. R. Hill, "Man-machine interaction using speech," Advances in Computers, vol. 11, pp. 165-230, 1971.
[CrossRef] [SCOPUS Times Cited 19]


[2] J.-U. Bang, M.-Y. Choi, S.-H. Kim, and O. W. Kwon, "Automatic construction of a large-scale speech recognition database using multi-genre broadcast data with inaccurate subtitle timestamps," IEICE Trans. Inf. Syst., vol. 103-D, pp. 406-415, 2020.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 12]


[3] D. K. Singh, P. P. Amin, H. B. Sailor, and H. A. Patil, "Data augmentation using CycleGAN for end-to-end children ASR," in 2021 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 511-515.
[CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 14]


[4] A. Chatziagapi et al., "Data Augmentation Using GANs for speech emotion recognition," in Proc. Interspeech 2019, 2019, pp. 171-175.
[CrossRef] [Web of Science Times Cited 75] [SCOPUS Times Cited 104]


[5] T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, "Audio augmentation for speech recognition," in Proc. Interspeech 2015, 2015, pp. 3586-3589.
[CrossRef]


[6] M. P. Fernandez-Gallego and D. T. Toledano, "A study of data augmentation for ASR robustness in low bit rate contact center recordings including packet losses," Applied Sciences, vol. 12, no. 3, 2022.
[CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 2]


[7] P. R. R. Gudepu, et al., "Whisper augmented end-to-end/hybrid speech recognition system - CycleGAN approach," in Proc. of Interspeech, Shanghai International Convention Center (virtual), Shanghai, China, 2020, pp. 2302-2306.
[CrossRef] [Web of Science Times Cited 5] [SCOPUS Times Cited 12]


[8] B. T. Atmaja and A. Sasou, "Effects of data augmentations on speech emotion recognition," Sensors, vol. 22, no. 16, 2022.
[CrossRef] [Web of Science Times Cited 9] [SCOPUS Times Cited 14]


[9] T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, "A study on data augmentation of reverberant speech for robust speech recognition," 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5220-5224, 2017.
[CrossRef] [SCOPUS Times Cited 674]


[10] J. M. Ramirez, A. Montalvo, and J. R. Calvo, "A survey of the effects of data augmentation for automatic speech recognition systems," in Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, I. Nystrom, Y. Hernandez Heredia, and V. Milian Nunez, Eds., Cham: Springer International Publishing, 2019, pp. 669-678.
[CrossRef] [Web of Science Times Cited 10] [SCOPUS Times Cited 10]


[11] R. Damania, "Data augmentation for automatic speech recognition for low resource languages," Rochester Institute of Technology, NY, United States of America, 2021

[12] O. O. Abayomi-Alli, R. Damasevicius, A. Qazi, M. Adedoyin-Olowe, and S. Misra, "Data augmentation and deep learning methods in sound classification: A systematic review," Electronics, vol. 11, no. 22, 2022.
[CrossRef] [Web of Science Times Cited 27] [SCOPUS Times Cited 40]


[13] T. Sugiura, A. Kobayashi, T. Utsuro, and H. Nishizaki, "Audio synthesis-based data augmentation considering audio event class," in 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), 2021, pp. 60-64.
[CrossRef] [SCOPUS Times Cited 6]


[14] M. Muthumari, C. A. Bhuvaneswari, J. E. N. S. Kumar Babu, and S. P. Raju, "Data augmentation model for audio signal extraction," in 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC), 2022, pp. 334-340.
[CrossRef] [SCOPUS Times Cited 3]


[15] B. Markovic, S. T. Jovicic, J. Galic, and D. Grozdic, "Whispered speech database: Design, processing and application," in Text, Speech, and Dialogue, I. Habernal and V. Matousek, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 591-598.
[CrossRef] [SCOPUS Times Cited 18]


[16] S. T. Jovicic, "Serbian emotional speech database: Design, processing and evaluation," in Proc. 9th Conference on Speech and Computer (SPECOM), 2004, pp. 77-81

[17] [Online] Available: Temporary on-line reference link removed - see the PDF document

[18] I.-D. Borlea, R.-E. Precup, and A. Borlea, "Improvement of K-means cluster quality by post processing resulted clusters," Procedia Computer Science, vol. 199, pp. 63-70, Feb. 2022.
[CrossRef] [Web of Science Times Cited 74] [SCOPUS Times Cited 87]


[19] C. Pozna and R.-E. Precup, "Aspects concerning the observation process modelling in the framework of cognition processes," Acta Politechnica Hungarica, vol. 9, no. 1, pp. 203-223, 2012

[20] S. Ogutcu, et al. "Early detection of mortality in COVID-19 patients through laboratory findings with factor analysis and artificial neural networks," Romanian Journal of Information Science and Technology, vol. 25, no. 4, pp. 290-302, 2022

[21] E. Arican and T. Aydin, "An RGB-D descriptor for object classification," Romanian Journal of Information Science and Technology, vol. 25, no. 3-4, pp. 338-349, 2022

[22] L. Ferreira-Paiva, E. Alfaro-Espinoza, V. M. Almeida, L. B. Felix, R. V. A. Neves, "A survey of data augmentation for audio classification," XXIV Brazilian Congress of Automatics (CBA), 2022

[23] G. Maguolo, M. Paci, L. Nanni, and L. Bonan, "Audiogmenter: a MATLAB toolbox for audio data augmentation," Applied Computing and Informatics, Jan. 2021.
[CrossRef] [SCOPUS Times Cited 12]


[24] D. T. Grozdic, S. T. Jovicic, and M. Subotic, "Whispered speech recognition using deep denoising autoencoder," Engineering Applications of Artificial Intelligence, vol. 59, pp. 15-22, 2017.
[CrossRef] [Web of Science Times Cited 54] [SCOPUS Times Cited 66]


[25] The MathWorks, Inc. (2021). MATLAB version: R2021b. Accessed: June 01, 2022. Available: https://www.mathworks.com

[26] J. Galic, S. T. Jovicic, D. Grozdic, and B. Markovic, "HTK-based recognition of whispered speech," in Speech and Computer, A. Ronzhin, R. Potapova, and V. Delic, Eds., Cham: Springer International Publishing, 2014, pp. 251-258.
[CrossRef] [SCOPUS Times Cited 10]


[27] S. Young, et al., "The HTK Book (for HTK Version 3.4)," Cambridge University Engineering Department, 2006

[28] J. Bernal-Chaves, C. Pelaez-Moreno, A. Gallardo-Antolin, and F. Diaz-de-Maria, "Multiclass SVM-based isolated-digit recognition using a HMM-guided segmentation," in Proc. ITRW on Nonlinear Speech Processing (NOLISP 2005), 2005, pp. 137-144

[29] Z. Qu, L. Yu, L. Zhang, and M. Shao, "A speech recognition system based on a hybrid HMM/SVM architecture," in First International Conference on Innovative Computing, Information and Control - Volume I (ICICIC'06), 2006, pp. 100-104.
[CrossRef]


[30] J. M. Garcia-Cabellos, C. Pelaez-Moreno, A. Gallardo-Antolin, F. Perez-Cruz, and F. Diaz-de-Maria, "SVM classifiers for ASR: A discussion about parameterization," in 12th European Signal Processing Conference, 2004, pp. 2067-2070.
[CrossRef]


[31] J. Galic, B. Popovic, and D. Sumarac Pavlovic, "Whispered speech recognition using hidden markov models and support vector machines," Acta Politechnica Hungarica, vol. 15, no. 5, pp. 11-29, 2018.
[CrossRef] [SCOPUS Times Cited 8]


[32] A. Alsobhani, H. M. A. ALabboodi, and H. Mahdi, "Speech recognition using convolution deep neural networks," Journal of Physics: Conference Series, vol. 1973, no. 1, p. 012166, Aug. 2021.
[CrossRef] [SCOPUS Times Cited 23]


[33] G. Habib and S. Qureshi, "Optimization and acceleration of convolutional neural networks: A survey," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4244-4268, 2022.
[CrossRef] [Web of Science Times Cited 44] [SCOPUS Times Cited 72]


[34] [Online] Available: Temporary on-line reference link removed - see the PDF document

[35] W. C. Sabine, Collected papers on acoustics, Harvard University Press; Reprint edition, pp. 3-69, 1922

[36] B. McFee, et al., "Librosa: Audio and music signal analysis in python," in Proceedings of the 14th Python in Science Conference, 2015, pp. 18-24.
[CrossRef]


[37] P. Virtanen et al., "SciPy 1.0: fundamental algorithms for scientific computing in Python," Nat Methods, vol. 17, no. 3, pp. 261-272, Mar. 2020.
[CrossRef] [Web of Science Times Cited 17739] [SCOPUS Times Cited 19224]


[38] J. D. Gibbons and S. Chakraborti, "Nonparametric statistical inference," in International Encyclopedia of Statistical Science, M. Lovric, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 977-979.
[CrossRef]




References Weight

Web of Science® Citations for all references: 18,055 TCR
SCOPUS® Citations for all references: 20,430 TCR

Web of Science® Average Citations per reference: 463 ACR
SCOPUS® Average Citations per reference: 524 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-10-13 14:17 in 175 seconds.




Note1: Web of Science® is a registered trademark of Clarivate Analytics.
Note2: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania


All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.




Website loading speed and performance optimization powered by: 


DNS Made Easy