Wielgat Robert, Zieliński Tomasz P, Woźniak Tomasz, Grabias Stanisław, Król Daniel
Department of Technology, Higher State Vocational School in Tarnów, Tarnów, Poland.
Folia Phoniatr Logop. 2008;60(6):323-31. doi: 10.1159/000170083. Epub 2008 Nov 11.
Proper diagnosis and therapy of pathological pronunciation of phonemes play an important role in modern logopedics. To enhance the efficiency of diagnosis and therapy an automatic recognition of pathological phoneme pronunciation is addressed in this paper. The authors focus on the therapy of phoneme substitution disorders.
Recognized speech samples come from speech-impaired Polish children and partially from persons imitating speech disorders. Recognized speech disorders were substitutions in pairs (for the correct phonetic charactors please see online article) embedded in Polish carrier words. In order to detect substitutions in the recognized words, recently proposed human factor cepstral coefficients (HFCC) have been implemented. Efficiency of the HFCC approach was compared to the application of standard mel-frequency cepstral coefficients (MFCC) as a feature vector. Both dynamic time warping (DTW), working on whole words or embedded phoneme patterns, and hidden Markov models (HMM) were used as classifiers. The HMM classifier was based on whole-word models as well as phoneme models. Results present a comparative analysis of DTW and HMM methods.
The superiority of HFCC features over those of MFCC was demonstrated. Results obtained by DTW methods, mainly by modified phoneme-based DTW classifier, were slightly better in comparison with the HMM classifier. Results obtained for the detection of substitution in pairs (for the correct phonetic charactors please see online article) are very promising. The methods developed for these cases can be integrated into computer systems for speech therapy. For substitutions in pairs (for the correct phonetic charactors please see online article) further research is necessary.
音素病理发音的正确诊断和治疗在现代言语治疗中起着重要作用。为提高诊断和治疗效率,本文探讨了病理性音素发音的自动识别。作者着重于音素替代障碍的治疗。
识别的语音样本来自波兰有言语障碍的儿童,部分来自模仿言语障碍的人。识别出的言语障碍是波兰载体词中嵌入的成对替代(正确的语音字符请见在线文章)。为检测识别词中的替代,已采用最近提出的人为因素倒谱系数(HFCC)。将HFCC方法的效率与作为特征向量的标准梅尔频率倒谱系数(MFCC)的应用进行比较。动态时间规整(DTW),可处理整个单词或嵌入的音素模式,以及隐马尔可夫模型(HMM)均用作分类器。HMM分类器基于全词模型以及音素模型。结果呈现了DTW和HMM方法的对比分析。
证明了HFCC特征优于MFCC特征。与HMM分类器相比,DTW方法(主要是改进的基于音素的DTW分类器)获得的结果略好。成对替代检测(正确的语音字符请见在线文章)所获得的结果很有前景。为这些情况开发的方法可集成到言语治疗的计算机系统中。对于成对替代(正确的语音字符请见在线文章),还需要进一步研究。