Najnin Shamima, Banerjee Bonny
Institute for Intelligent Systems, and Department of Electrical and Computer Engineering, 3815 Central Avenue, The University of Memphis, Memphis, Tennessee 38152, USA
J Acoust Soc Am. 2015 Sep;138(3):EL229-35. doi: 10.1121/1.4929626.
The problem of nonlinear acoustic to articulatory inversion mapping is investigated in the feature space using two models, the deep belief network (DBN) which is the state-of-the-art, and the general regression neural network (GRNN). The task is to estimate a set of articulatory features for improved speech recognition. Experiments with MOCHA-TIMIT and MNGU0 databases reveal that, for speech inversion, GRNN yields a lower root-mean-square error and a higher correlation than DBN. It is also shown that conjunction of acoustic and GRNN-estimated articulatory features yields state-of-the-art accuracy in broad class phonetic classification and phoneme recognition using less computational power.
在特征空间中,使用两种模型——最先进的深度信念网络(DBN)和广义回归神经网络(GRNN),研究了非线性声学到发音反演映射的问题。任务是估计一组发音特征以改进语音识别。使用MOCHA-TIMIT和MNGU0数据库进行的实验表明,对于语音反演,GRNN比DBN产生更低的均方根误差和更高的相关性。研究还表明,将声学特征与GRNN估计的发音特征相结合,在宽类语音分类和音素识别中,使用更少的计算能力就能产生最先进的准确率。