基于调制频率选择处理的语音可懂度预测。

Speech intelligibility prediction based on modulation frequency-selective processing.

机构信息

Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby 2800, Denmark; Cognitive Systems Section, Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs, Lyngby 2800, Denmark.

Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby 2800, Denmark.

出版信息

Hear Res. 2022 Dec;426:108610. doi: 10.1016/j.heares.2022.108610. Epub 2022 Sep 13.

DOI:10.1016/j.heares.2022.108610

PMID:36163219

Abstract

Speech intelligibility models can provide insights regarding the auditory processes involved in human speech perception and communication. One successful approach to modelling speech intelligibility has been based on the analysis of the amplitude modulations present in speech as well as competing interferers. This review covers speech intelligibility models that include a modulation-frequency selective processing stage i.e., a modulation filterbank, as part of their front end. The speech-based envelope power spectrum model [sEPSM, Jørgensen and Dau (2011). J. Acoust. Soc. Am. 130(3), 1475-1487], several variants of the sEPSM including modifications with respect to temporal resolution, spectro-temporal processing and binaural processing, as well as the speech-based computational auditory signal processing and perception model [sCASP; Relaño-Iborra et al. (2019). J. Acoust. Soc. Am. 146(5), 3306-3317], which is based on an established auditory signal detection and masking model, are discussed. The key processing stages of these models for the prediction of speech intelligibility across a variety of acoustic conditions are addressed in relation to competing modeling approaches. The strengths and weaknesses of the modulation-based analysis are outlined and perspectives presented, particularly in connection with the challenge of predicting the consequences of individual hearing loss on speech intelligibility.

摘要

语音可懂度模型可以提供有关人类言语感知和交流所涉及的听觉过程的深入了解。一种成功的建模方法是基于对言语和竞争干扰中的幅度调制的分析。本综述涵盖了将调制频率选择处理阶段（即调制滤波器组）作为前端的一部分纳入其中的语音可懂度模型。基于语音的包络功率谱模型[sEPSM，Jørgensen 和 Dau（2011）。J. Acoust. Soc. Am. 130(3), 1475-1487]、包括针对时间分辨率、谱时处理和双耳处理进行修改的 sEPSM 的几个变体，以及基于语音的计算听觉信号处理和感知模型[sCASP；Relaño-Iborra 等人（2019）。J. Acoust. Soc. Am. 146(5), 3306-3317]，它基于已建立的听觉信号检测和掩蔽模型，讨论了这些模型的关键处理阶段，用于预测各种声学条件下的语音可懂度。与竞争建模方法有关，讨论了这些模型用于预测语音可懂度的关键处理阶段。概述了基于调制的分析的优缺点，并提出了观点，特别是在预测个体听力损失对语音可懂度的影响方面的挑战。