Hilkhuysen Gaston, Gaubitch Nikolay, Brookes Mike, Huckvale Mark
Department of Speech, Language and Hearing Sciences, University College London, 2 Wakefield Street, London WC1N 1PF, United Kingdom.
Electrical and Electronic Engineering Department, Imperial College, Exhibition Road, London SW7 2BT, United Kingdom.
J Acoust Soc Am. 2014 Jan;135(1):439-50. doi: 10.1121/1.4837238.
Using the data presented in the accompanying paper [Hilkhuysen et al., J. Acoust. Soc. Am. 131, 531-539 (2012)], the ability of six metrics to predict intelligibility of speech in noise before and after noise suppression was studied. The metrics considered were the Speech Intelligibility Index (SII), the fractional Articulation Index (fAI), the coherence intelligibility index based on the mid-levels in speech (CSIImid), an extension of the Normalized Coherence Metric (NCM+), a part of the speech-based envelope power model (pre-sEPSM), and the Short Term Objective Intelligibility measure (STOI). Three of the measures, SII, CSIImid, and NCM+, overpredicted intelligibility after noise reduction, whereas fAI underpredicted these intelligibilities. The pre-sEPSM metric worked well for speech in babble but failed with car noise. STOI gave the best predictions, but overall the size of intelligibility prediction errors were greater than the change in intelligibility caused by noise suppression. Suggestions for improvements of the metrics are discussed.
利用随附论文[希尔库伊森等人,《美国声学学会杂志》131,531 - 539(2012年)]中给出的数据,研究了六种指标在噪声抑制前后预测噪声中语音可懂度的能力。所考虑的指标有语音可懂度指数(SII)、分数清晰度指数(fAI)、基于语音中值电平的相干可懂度指数(CSIImid)、归一化相干度量的扩展(NCM +)、基于语音的包络功率模型的一部分(预sEPSM)以及短期客观可懂度度量(STOI)。其中三种指标,即SII、CSIImid和NCM +,在降噪后对可懂度的预测过高,而fAI则对这些可懂度的预测过低。预sEPSM指标在嘈杂语音中表现良好,但在汽车噪声环境下失效。STOI给出了最佳预测,但总体而言,可懂度预测误差的大小大于噪声抑制所导致的可懂度变化。文中讨论了对这些指标进行改进的建议。