Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London WC1N 1PF, United Kingdom.
Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
J Acoust Soc Am. 2019 Oct;146(4):2562. doi: 10.1121/1.5129050.
Four existing speech intelligibility models with different theoretical assumptions were used to predict previously published behavioural data. Those data showed that complex tones with pitch-related periodicity are far less effective maskers of speech than aperiodic noise. This so-called masker-periodicity benefit (MPB) far exceeded the fluctuating-masker benefit (FMB) obtained from slow masker envelope fluctuations. In contrast, the normal-hearing listeners hardly benefitted from periodicity in the target speech. All tested models consistently underestimated MPB and FMB, while most of them also overestimated the intelligibility of vocoded speech. To understand these shortcomings, the internal signal representations of the models were analysed in detail. The best-performing model, the correlation-based version of the speech-based envelope power spectrum model (sEPSM), combined an auditory processing front end with a modulation filterbank and a correlation-based back end. This model was then modified to further improve the predictions. The resulting second version of the sEPSM outperformed the original model with all tested maskers and accounted for about half the MPB, which can be attributed to reduced modulation masking caused by the periodic maskers. However, as the sEPSM2 failed to account for the other half of the MPB, the results also indicate that future models should consider the contribution of pitch-related effects, such as enhanced stream segregation, to further improve their predictive power.
四种具有不同理论假设的现有语音可懂度模型被用于预测先前发表的行为数据。这些数据表明,具有音高相关周期性的复合音作为语音掩蔽的效果远不如非周期性噪声。这种所谓的掩蔽器周期性优势(MPB)远远超过了从慢掩蔽器包络波动中获得的波动掩蔽器优势(FMB)。相比之下,正常听力的听众几乎没有受益于目标语音的周期性。所有测试的模型都一致低估了 MPB 和 FMB,而大多数模型也高估了声码化语音的可懂度。为了理解这些缺点,详细分析了模型的内部信号表示。表现最好的模型是基于语音包络功率谱模型(sEPSM)的相关版本,它将听觉处理前端与调制滤波器组和基于相关的后端相结合。然后对该模型进行了修改,以进一步提高预测能力。改进后的 sEPSM 第二代模型在所有测试掩蔽器上的表现均优于原始模型,并且可以解释约一半的 MPB,这归因于周期性掩蔽器引起的调制掩蔽减少。然而,由于 sEPSM2 无法解释 MPB 的另一半,结果还表明,未来的模型应该考虑与音高相关的效应(例如增强的流分离)的贡献,以进一步提高其预测能力。