短时强度和包络功率对语音清晰度及心理声学掩蔽的作用。

The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking.

作者信息

Biberger Thomas, Ewert Stephan D

机构信息

Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, 26111 Oldenburg, Germany.

出版信息

J Acoust Soc Am. 2017 Aug;142(2):1098. doi: 10.1121/1.4999059.

DOI:10.1121/1.4999059

PMID:28863616

Abstract

The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023-1038], combining the "classical" concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524-540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436-446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.

摘要

广义功率谱模型[GPSM；比伯格和埃沃特（2016年）。《美国声学学会杂志》140卷，第1023 - 1038页]，结合了功率谱模型（PSM）的“经典”概念和包络功率谱模型（EPSM），已被证明能够解释多项心理声学和言语可懂度（SI）实验。该模型的PSM路径使用长时间功率信噪比（SNR），而EPSM路径使用短时间包络功率SNR。对现有SI模型针对语音掩蔽器的几种频谱 - 时间操作以及目标和掩蔽器说话者的性别组合进行的系统比较[舒博茨等人（2016年）。《美国声学学会杂志》140卷，第524 - 540页]表明了短时间功率特征的重要性。相反，约根森等人[（2013年）。《美国声学学会杂志》134卷，第436 - 446页]证明，使用混响和谱减法时，短时间包络功率SNR的预测能力高于功率SNR。在此，GPSM被扩展以利用短时间功率SNR，并被证明能够解释上述三项研究的所有心理声学和SI数据。最佳处理策略是根据实验任务专门使用功率或包络功率SNR。通过分析这两个领域，所提出的模型可能为阐明调幅掩蔽和能量掩蔽的贡献提供一个有用的工具。