Kalita Sishir, Mahadeva Prasanna S R, Dandapat S
Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India.
J Acoust Soc Am. 2018 Oct;144(4):2413. doi: 10.1121/1.5064463.
Intelligibility is considered as one of the primary measures for speech rehabilitation of individuals with a cleft lip and palate (CLP). Currently, speech processing and machine-learning-based objective methods are gaining more research interest as a way to quantify speech intelligibility. In this work, joint spectro-temporal features computed from a time-frequency representation of speech are explored to derive speech representations based on Gaussian posteriograms. A comparative framework using dynamic time warping (DTW) is used to quantify the intelligibility of child CLP speech. The DTW distance is used to score sentence-level intelligibility and tested for correlation with perceptual intelligibility ratings obtained from expert speech-language pathologists. A baseline DTW system using the conventional Mel-frequency cepstral coefficients (MFCCs) is also developed to compare the performance of the proposed system. Spearman's rank correlation coefficient between the objective intelligibility scores and the perceptual intelligibility rating is studied. A Williams significance test is conducted to assess the statistical significance of the correlation difference between the methods. The results show that the system based on joint spectro-temporal features significantly outperforms the MFCC-based system.
可懂度被视为唇腭裂(CLP)患者言语康复的主要指标之一。目前,语音处理和基于机器学习的客观方法作为量化言语可懂度的一种方式,正获得越来越多的研究关注。在这项工作中,探索了从语音的时频表示中计算出的联合谱-时间特征,以基于高斯后验图导出语音表示。使用动态时间规整(DTW)的比较框架用于量化儿童CLP语音的可懂度。DTW距离用于对句子级可懂度进行评分,并测试其与从专业言语病理学家获得的感知可懂度评级的相关性。还开发了一个使用传统梅尔频率倒谱系数(MFCC)的基线DTW系统,以比较所提出系统的性能。研究了客观可懂度分数与感知可懂度评级之间的斯皮尔曼等级相关系数。进行威廉姆斯显著性检验,以评估方法之间相关性差异的统计显著性。结果表明,基于联合谱-时间特征的系统明显优于基于MFCC的系统。