基于调制频率选择性处理后的信噪比包络功率比预测语音可懂度。

Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.

机构信息

Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.

出版信息

J Acoust Soc Am. 2011 Sep;130(3):1475-87. doi: 10.1121/1.3621502.

DOI:10.1121/1.3621502

PMID:21895088

Abstract

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility.

摘要

提出了一种预测处理噪声语音可懂度的模型。基于语音的包络功率谱模型与 Ewert 和 Dau [（2000）。J. Acoust. Soc. Am. 108, 1181-1196] 开发的调制检测和掩蔽数据模型具有相似的结构。该模型估计调制滤波器组输出处的语音-噪声包络功率比 SNR(env)，并使用理想观察者的概念将该度量与语音可懂度联系起来。预测结果与在平稳语音噪声中呈现的语音可懂度数据进行了比较。该模型还在具有混响和频谱减法的噪声语音条件下进行了进一步测试。在所有情况下，预测结果与数据都非常吻合。对于频谱减法，对模型对刺激的内部表示的分析表明，可懂度的预测下降是由于估计的噪声包络功率超过了语音的包络功率。在这种情况下，经典的语音传输指数概念失败了。结果强烈表明，调制频率选择性过程输出处的信噪比提供了语音可懂度的关键度量。