Lachhab Othman, Di Martino Joseph, Elhaj Elhassane Ibn, Hammouch Ahmed
LRGE Laboratory, ENSET, Mohammed 5 University, Madinat Al Irfane, Rabat, Morocco.
LORIA, B.P. 239, Vandœuvre-lès-Nancy, 54506 France.
Springerplus. 2015 Oct 26;4:644. doi: 10.1186/s40064-015-1428-2. eCollection 2015.
In this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion method. The esophageal speech is converted into a "target" laryngeal speech using an iterative statistical estimation of a transformation function. We did not apply a speech synthesizer for reconstructing the converted speech signal, given that the converted Mel cepstral vectors are used directly as input of our speech recognition system. Furthermore the feature vectors are linearly transformed by the HLDA (heteroscedastic linear discriminant analysis) method to reduce their size in a smaller space having good discriminative properties. The experimental results demonstrate that our proposed system provides an improvement of the phone recognition accuracy with an absolute increase of 3.40 % when compared with the phone recognition accuracy obtained with neither HLDA nor voice conversion.
在本文中,我们提出了一种基于改进的统计高斯混合模型(GMM)语音转换算法的混合系统,用于提高食管语音的识别率。该混合系统旨在通过语音转换方法来补偿食管声学特征中存在的失真信息。利用变换函数的迭代统计估计,将食管语音转换为“目标”喉部语音。由于转换后的梅尔倒谱向量直接用作我们语音识别系统的输入,因此我们没有应用语音合成器来重构转换后的语音信号。此外,通过异方差线性判别分析(HLDA)方法对特征向量进行线性变换,以在具有良好判别特性的较小空间中减小其维度。实验结果表明,与未使用HLDA和语音转换时获得的音素识别准确率相比,我们提出的系统将音素识别准确率提高了3.40%。