Department of Electronic Engineering, Konkuk University, Seoul 143-701, Korea.
IEEE Trans Biomed Eng. 2010 Jul;57(7):1587-95. doi: 10.1109/TBME.2010.2041455. Epub 2010 Feb 17.
It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test.
众所周知,人类的声音和口腔区域的肌电信号(MESs)之间存在明显的关系。在这项研究中,我们利用这一信息,实现了一种语音合成方案,其中仅使用 MES 来预测特定语音信号的声道传递函数的参数。研究了从 MES 中提取的几个特征参数,以找到最优特征,从而最大化声学分与 MES 特征之间的互信息。确定最优特征后,根据最小均方误差(MMSE)准则,提出了一种用于估计声参量的估计规则。在初步研究中,使用 60 个孤立的单词进行了客观和主观的评估。结果表明,与原始参数的平均欧几里得距离相比,原始和预测的声学参数之间的平均欧几里得距离减少了约 30%。使用预测特征合成的语音信号的可懂度也进行了评估。通过听力测试获得了 65.5%的单词级识别率和 73%的音节级识别率。