Center for Robust Speech Systems (CRSS), The University of Texas at Dallas, 800 West Campbell Road, Richardson, Texas 75080, USA.
J Acoust Soc Am. 2011 Dec;130(6):3992-8. doi: 10.1121/1.3647301.
Physical task stress is known to affect the fundamental frequency and other measurements of the speech signal. A corpus of physical task stress speech is analyzed using a spectrum F-ratio and frame score distribution divergences. The measurements differ between phone classes, and are greater for vowels and nasals than for plosives and fricatives. In further analysis, frame score distribution divergences are used to measure the spectral dissimilarity between neutral and physical task stress speech. Frame scores are the log likelihood ratios between Gaussian mixture models (GMMs) of physical task stress and of neutral speech. Mel-frequency cepstral coefficients are used as the acoustic feature inputs to the GMMs. A Laplacian distribution is fitted to the frame scores for each of ten phone classes, and the symmetric Kullback-Leibler divergence is employed to measure the change in distribution from neutral to physical task stress. The results suggest that the spectral dissimilarity is greatest for the second level of a four level exertion measurement, and that spectral dissimilarity is greater for nasal phones than for plosives and fricatives. Further, the results suggest that different phone classes are affected differently by physical task stress.
生理任务应激已知会影响语音信号的基频和其他测量值。使用频谱 F 比和帧得分分布差异分析了生理任务应激语音语料库。这些测量值在音素类别之间有所不同,对于元音和鼻音来说,比爆破音和擦音更大。在进一步的分析中,使用帧得分分布差异来测量中性和生理任务应激语音之间的频谱相似度。帧得分是生理任务应激和中性语音的高斯混合模型(GMM)之间的对数似然比。梅尔频率倒谱系数用作 GMM 的声学特征输入。对十个音素类别的每一个拟合拉普拉斯分布,并采用对称的 Kullback-Leibler 散度来衡量从中性到生理任务应激的分布变化。结果表明,在四级用力测量的第二级,频谱相似度最大,而在鼻音方面,与爆破音和擦音相比,频谱相似度更大。此外,结果表明,不同的音素类别受到生理任务应激的影响不同。