| 3/2025 - 7 |
A Novel Approach to the Phonetic Representativeness of Writing Systems: Phonetic Correspondence Efficiency (PCE)TOHMA, K. |
| Extra paper information in |
| Click to see author's profile in |
| Download PDF |
Author keywords
natural language processing, text processing, error analysis, text analysis, natural language
References keywords
phonetic(8), language(7), turkic(6), speech(6), languages(6), natural(5), resource(4), processing(4), deep(4), access(4)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2025-10-31
Volume 25, Issue 3, Year 2025, On page(s): 59 - 68
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2025.03007
Abstract
The phonetic representativeness of writing systems is of critical importance for the advancement of language technologies. In this study, the Phonetic Correspondence Efficiency metric is introduced to measure how accurately, uniquely, and economically a writing system captures the sound structure of a language. To evaluate the metrics validity and practical utility, comparative analyses were conducted using datasets updated in accordance with the Common Turkic Alphabet; the results demonstrate that these updates effectively reflect improvements and changes in orthographypronunciation alignment. Improvements of 18-19% were observed with weighted computation methods, while logarithmic approaches yielded enhancements of 11-12%. Additionally, segment-based computations indicate that the method maintains consistent performance across different scales. As PCE does not require training data or pre-trained models, it stands out as an innovative metric for assessing the overall phonetic alignment of writing systems from a universal perspective, offering significant potential for writing reforms and the development of natural language processing applications. |
| References | | | Cited By «-- Click to see who has cited this paper |
| [1] L. Johanson and E. I. Csato, The Turkic Languages. Routledge, 2015. [CrossRef] [2] E. Adali, "The logic of Turkish language for NLP," Journal of Problems in Computer Science and Information Technologies, vol. 2, no. 2, pp. 34-46, 2024. [CrossRef] [3] D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, et al., "Deep speech 2: End-to-end speech recognition in English and Mandarin," in Proc. Int. Conf. Machine Learning, 2016, pp. 173-182, PMLR [4] J. Shen et al., "Natural TTS synthesis by conditioning Wavenet on MEL spectrogram predictions," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 4779-4783. [CrossRef] [SCOPUS Times Cited 2209] [5] M. Bisani and H. Ney, "Joint-sequence models for grapheme-to-phoneme conversion," Speech Commun., vol. 50, no. 5, pp. 434-451, 2008. [CrossRef] [Web of Science Times Cited 399] [SCOPUS Times Cited 561] [6] P. J. Rao, K. N. Rao, S. Gokuruboyina, and K. N. Neeraja, "An efficient methodology for identifying the similarity between languages with Levenshtein distance," in Proc. Int. Conf. Communications and Cyber Physical Engineering 2018, Singapore: Springer Nature Singapore, Feb. 2024, pp. 161-174. [CrossRef] [SCOPUS Times Cited 6] [7] N. Kumar, A. Narang, and B. Lall, "Kullback-Leibler divergence-based regularized normalization for low-resource tasks," IEEE Trans. Artif. Intell., vol. 5, no. 6, pp. 2638-2650, 2023. [CrossRef] [SCOPUS Times Cited 4] [8] C. Zhang and X. Cao, "Biological gene extraction path based on knowledge graph and natural language processing," Front. Genet., vol. 13, p. 1086379, Jan. 2023. [CrossRef] [Web of Science Record] [SCOPUS Times Cited 1] [9] S. Cheng, P. Zhu, J. Liu, and Z. Wang, "A survey of grapheme-to-phoneme conversion methods," Appl. Sci. (Basel), vol. 14, no. 24, p. 11790, 2024. [CrossRef] [Web of Science Times Cited 2] [SCOPUS Times Cited 3] [10] J. Route, S. Hillis, I. C. Etinger, H. Zhang, and A. W. Black, "Multimodal, multilingual grapheme-to-phoneme conversion for low-resource languages," in Proc. 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), Nov. 2019, pp. 192-201. [CrossRef] [11] "Common Turkic alphabet," Wikipedia, 2024. [12] Y. Seddiq, Y. A. Alotaibi, S. A. Selouani, and A. H. Meftah, "Distinctive phonetic features modeling and extraction using deep neural networks," IEEE Access, vol. 7, pp. 81382-81396, 2019. [CrossRef] [Web of Science Times Cited 8] [SCOPUS Times Cited 9] [13] Z. Huang, J. Epps, D. Joachim, and V. Sethu, "Natural language processing methods for acoustic and landmark event-based features in speech-based depression detection," IEEE J. Sel. Top. Signal Process., vol. 14, no. 2, pp. 435-448, 2019. [CrossRef] [Web of Science Times Cited 31] [SCOPUS Times Cited 43] [14] I. Kukanov, T. N. Trong, V. Hautamaki, S. M. Siniscalchi, V. M. Salerno, and K. A. Lee, "Maximal figure-of-merit framework to detect multi-label phonetic features for spoken language recognition," IEEE/ACM Trans. Audio Speech Lang. Process., vol. 28, pp. 682-695, 2020. [CrossRef] [Web of Science Times Cited 13] [SCOPUS Times Cited 12] [15] J. Nair and R. Ahammed, "English to Indian language and back transliteration with phonetic transcription for computational linguistics tools based on conventional transliteration schemes," in Proc. 4th Int. Conf. Electrical, Computer and Communication Technologies (ICECCT), 2021, pp. 1-6. [CrossRef] [SCOPUS Times Cited 3] [16] V. S. Vykhovanets, J. Du, and S. A. Sakulin, "An overview of phonetic encoding algorithms," Autom. Remote Control, vol. 81, no. 10, pp. 1896-1910, 2020. [CrossRef] [Web of Science Times Cited 6] [SCOPUS Times Cited 9] [17] A. Kavros and Y. Tzitzikas, "SoundexGR: An algorithm for phonetic matching for the Greek language," Nat. Lang. Eng., vol. 29, no. 5, pp. 1305-1340, 2023. [CrossRef] [Web of Science Record] [SCOPUS Times Cited 3] [18] Y. A. El-Imam and Z. M. Don, "Rules and algorithms for phonetic transcription of standard Malay," IEICE Trans. Inf. Syst., vol. 88, no. 10, pp. 2354-2372, 2005. [CrossRef] [Web of Science Times Cited 8] [SCOPUS Times Cited 14] [19] G. A. D. M. Almeida, "Using phonetic knowledge in tools and resources for natural language processing and pronunciation evaluation," Ph.D. dissertation, Universidade de Sao Paulo, 2016. [CrossRef] [20] P. Zelasko, L. Moro-Velazquez, M. Hasegawa-Johnson, O. Scharenborg, and N. Dehak, "That sounds familiar: an analysis of phonetic representations transfer across languages," arXiv preprint, 2020. [CrossRef] [Web of Science Times Cited 11] [SCOPUS Times Cited 19] [21] I. Atabey, "Turkce-yazi iliskisi ve bagimsizliklarinin 30. yilinda Turk Cumhuriyetlerinde alfabe," Int. J. Volga-Ural and Turkestan Studies, vol. 6, no. 2, pp. 207-220, 2024 [22] K. Tohma and H. I. Okur, "Ortak Turk abecesi ve ses yapisinin dogal dil isleme uygulamalari uzerindeki etkisi: Kesifsel bir derleme ve gelecek yonelimler," Bilgisayar Bilimleri ve Muhendisligi Dergisi, vol. 18, no. 1, pp. 44-56, 2025 [23] S. Ellouze and M. Jaoua, "Leveraging deep embedding models for Arabic text summaries evaluation," SN Comput. Sci., vol. 5, no. 7, p. 885, 2024. [CrossRef] [SCOPUS Times Cited 1] [24] P. C. De Souza, "So-called orthographic depth is not always consistent," Lang. Commun., vol. 87, pp. 72-98, 2022. [CrossRef] [Web of Science Record] [SCOPUS Times Cited 1] [25] B. R. Karimov, "Proposals for a common Turkic alphabet and the creation of a common Turkic writing system 'Ortabitik'," Oriental Renaissance: Innovative, Educational, Natural and Social Sciences, vol. 4, no. 26, pp. 587-598, 2024 [26] J. Dees, "Non-uniformity in phonologizing phase heads: Evidence from Kazakh," in Proc. Workshop on Turkic and Languages in Contact with Turkic, 2023. [CrossRef] [27] E. Aliyeva, "Yeni Uygur Turkcesinin dil ozellikleri ve Azerbaycan Turkcesiyle ortakliklar," Akademik Tarih ve Dusunce Dergisi, vol. 9, no. 2, pp. 484-499, 2022 [28] K. Tohma, H. I. Okur, Y. Kutlu, and A. Sertbas, "Sentiment analysis in Turkish question answering systems: An application of human-robot interaction," IEEE Access, vol. 11, pp. 66522-66534, 2023. [CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 10] [29] D. Kilinc, A. Ozcift, F. Bozyigit, P. Yildirim, F. Yucalar, and E. Borandag, "TTC-3600: A new benchmark dataset for Turkish text categorization," J. Inf. Sci., vol. 43, no. 2, pp. 174-185, 2017. [CrossRef] [Web of Science Times Cited 49] [SCOPUS Times Cited 60] [30] P. Rajpurkar, R. Jia, and P. Liang, "Know what you don't know: Unanswerable questions for SQuAD," arXiv preprint, 2018. [CrossRef] [SCOPUS Times Cited 1491] [31] E. Budur, R. Ozcelik, D. Soylu, O. Khattab, T. Gungor, and C. Potts, "Building efficient and effective OpenQA systems for low-resource languages," Knowl. Base. Syst., vol. 302, p. 112243, 2024. [CrossRef] [Web of Science Times Cited 1] [SCOPUS Times Cited 2] [32] H. I. Okur, K. Tohma, and A. Sertbas, "Graf sinir aglari ile iliskisel Turkce metin siniflandirma," Politeknik Dergisi, pp. 1-1, Oct. 2024. [CrossRef] [Web of Science Record] Web of Science® Citations for all references: 535 TCR SCOPUS® Citations for all references: 4,461 TCR Web of Science® Average Citations per reference: 16 ACR SCOPUS® Average Citations per reference: 135 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2025-11-16 14:44 in 175 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.


