Zada Bakht, Ullah Rahim
Government Degree College Samar Bagh, Pakistan.
Heliyon. 2020 Feb 12;6(2):e03372. doi: 10.1016/j.heliyon.2020.e03372. eCollection 2020 Feb.
Speech recognition has become one of the most significant parts of human-computer interaction due to emergence of new technologies such as smartphone, smart watch and many modern technologies, therefore the need of an ASR for local languages is felt. The basic aim of this paper is to develop an isolated digits recognition for Pashto language, using deep CNN. The database of Pashto digits from 0 to 9 with 50 utterance for each digits is used. Twenty MFCC features extracted for each isolated digit and fed as input to CNN. The network has been used for the proposed system is deep up to 4 convolutional layers, followed by ReLU and max-pooling layers. The network has been trained on the 50% of data and the rest of the data was used for testing. The total average of 84.17% accuracy was achieved for testing which show 7.32% better performance as compared to existing similar works.
由于智能手机、智能手表等新技术以及许多现代技术的出现,语音识别已成为人机交互最重要的部分之一,因此人们感到需要一种针对本地语言的自动语音识别(ASR)。本文的基本目标是使用深度卷积神经网络(CNN)开发一种普什图语孤立数字识别系统。使用了包含从0到9的普什图语数字数据库,每个数字有50个发音。为每个孤立数字提取20个梅尔频率倒谱系数(MFCC)特征,并将其作为输入馈送到CNN。所提出的系统使用的网络深度达4个卷积层,随后是整流线性单元(ReLU)和最大池化层。该网络在50%的数据上进行训练,其余数据用于测试。测试的总平均准确率达到84.17%,与现有类似工作相比,性能提高了7.32%。