Chabot-Leclerc Alexandre, Jørgensen Søren, Dau Torsten
Department of Electrical Engineering, Centre for Applied Hearing Research, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
J Acoust Soc Am. 2014 Jun;135(6):3502-12. doi: 10.1121/1.4873517.
Speech intelligibility models typically consist of a preprocessing part that transforms stimuli into some internal (auditory) representation and a decision metric that relates the internal representation to speech intelligibility. The present study analyzed the role of modulation filtering in the preprocessing of different speech intelligibility models by comparing predictions from models that either assume a spectro-temporal (i.e., two-dimensional) or a temporal-only (i.e., one-dimensional) modulation filterbank. Furthermore, the role of the decision metric for speech intelligibility was investigated by comparing predictions from models based on the signal-to-noise envelope power ratio, SNRenv, and the modulation transfer function, MTF. The models were evaluated in conditions of noisy speech (1) subjected to reverberation, (2) distorted by phase jitter, or (3) processed by noise reduction via spectral subtraction. The results suggested that a decision metric based on the SNRenv may provide a more general basis for predicting speech intelligibility than a metric based on the MTF. Moreover, the one-dimensional modulation filtering process was found to be sufficient to account for the data when combined with a measure of across (audio) frequency variability at the output of the auditory preprocessing. A complex spectro-temporal modulation filterbank might therefore not be required for speech intelligibility prediction.
语音可懂度模型通常由一个将刺激转换为某种内部(听觉)表征的预处理部分和一个将内部表征与语音可懂度相关联的决策指标组成。本研究通过比较假设存在频谱-时间(即二维)或仅时间(即一维)调制滤波器组的模型的预测,分析了调制滤波在不同语音可懂度模型预处理中的作用。此外,通过比较基于信号与噪声包络功率比(SNRenv)和调制传递函数(MTF)的模型的预测,研究了决策指标对语音可懂度的作用。在以下有噪声语音条件下对模型进行了评估:(1)受到混响影响;(2)因相位抖动而失真;或(3)通过谱减法进行降噪处理。结果表明,基于SNRenv的决策指标可能比基于MTF的指标为预测语音可懂度提供更通用的基础。此外,当与听觉预处理输出处的跨(音频)频率变异性度量相结合时,发现一维调制滤波过程足以解释数据。因此,语音可懂度预测可能不需要复杂的频谱-时间调制滤波器组。