Department of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 13120, Korea.
Department of Information Technologies, Samarkand Branch of Tashkent University of Information Technologies Named after Muhammad al-Khwarizmi, Tashkent 140100, Uzbekistan.
Sensors (Basel). 2022 May 12;22(10):3683. doi: 10.3390/s22103683.
Communication has been an important aspect of human life, civilization, and globalization for thousands of years. Biometric analysis, education, security, healthcare, and smart cities are only a few examples of speech recognition applications. Most studies have mainly concentrated on English, Spanish, Japanese, or Chinese, disregarding other low-resource languages, such as Uzbek, leaving their analysis open. In this paper, we propose an End-To-End Deep Neural Network-Hidden Markov Model speech recognition model and a hybrid Connectionist Temporal Classification (CTC)-attention network for the Uzbek language and its dialects. The proposed approach reduces training time and improves speech recognition accuracy by effectively using CTC objective function in attention model training. We evaluated the linguistic and lay-native speaker performances on the Uzbek language dataset, which was collected as a part of this study. Experimental results show that the proposed model achieved a word error rate of 14.3% using 207 h of recordings as an Uzbek language training dataset.
数千年来,交流一直是人类生活、文明和全球化的重要组成部分。生物识别分析、教育、安全、医疗保健和智慧城市只是语音识别应用的几个例子。大多数研究主要集中在英语、西班牙语、日语或汉语上,而忽略了其他低资源语言,如乌兹别克语,使其分析开放。在本文中,我们提出了一种端到端的深度神经网络-隐马尔可夫模型语音识别模型和一种混合连接时间分类(CTC)-注意力网络,用于乌兹别克语及其方言。所提出的方法通过在注意力模型训练中有效地使用 CTC 目标函数,减少了训练时间并提高了语音识别准确性。我们评估了语言和非母语人士在乌兹别克语数据集上的表现,该数据集是作为本研究的一部分收集的。实验结果表明,所提出的模型在使用 207 小时的录音作为乌兹别克语训练数据集时,错误率为 14.3%。