Institute of Sound and Vibration Research, University of Southampton, Southampton, UK.
Int J Audiol. 2012 Feb;51(2):75-82. doi: 10.3109/14992027.2011.625984. Epub 2011 Nov 22.
Established methods for predicting speech recognition in noise require knowledge of clean speech signals, placing limitations on their application. The study evaluates an alternative approach based on characteristics of noisy speech, specifically its sparseness as represented by the statistic kurtosis.
Experiments 1 and 2 involved acoustic analysis of vowel-consonant-vowel (VCV) syllables in babble noise, comparing kurtosis, glimpsing areas, and extended speech intelligibility index (ESII) of noisy speech signals with one another and with pre-existing speech recognition scores. Experiment 3 manipulated kurtosis of VCV syllables and investigated effects on speech recognition scores in normal-hearing listeners.
Pre-existing speech recognition data for Experiments 1 and 2; seven normal-hearing participants for Experiment 3.
Experiments 1 and 2 demonstrated that kurtosis calculated in the time-domain from noisy speech is highly correlated (r > 0.98) with established prediction models: glimpsing and ESII. All three measures predicted speech recognition scores well. The final experiment showed a clear monotonic relationship between speech recognition scores and kurtosis.
Speech recognition performance in noise is closely related to the sparseness (kurtosis) of the noisy speech signal, at least for the types of speech and noise used here and for listeners with normal hearing.
现有的噪声环境下语音识别预测方法需要干净语音信号的知识,这限制了它们的应用。本研究评估了一种基于噪声语音特征的替代方法,特别是其稀疏性,以统计量峰度来表示。
实验 1 和 2 涉及在背景噪声中元音-辅音-元音(VCV)音节的声学分析,比较了噪声语音信号的峰度、瞥见区域和扩展语音可懂度指数(ESII)彼此之间以及与现有语音识别分数之间的关系。实验 3 操纵 VCV 音节的峰度,并研究了其对正常听力受试者语音识别分数的影响。
实验 1 和 2 的现有语音识别数据;实验 3 的 7 名正常听力参与者。
实验 1 和 2 表明,从噪声语音中计算出的时域峰度与现有的预测模型(瞥见和 ESII)高度相关(r>0.98)。所有这三个指标都能很好地预测语音识别分数。最后一个实验表明,语音识别分数与峰度之间存在明显的单调关系。
噪声环境下的语音识别性能与噪声语音信号的稀疏性(峰度)密切相关,至少对于这里使用的语音和噪声类型以及听力正常的听众而言是如此。