操纵信噪包络功率比对言语可懂度的影响。

Effects of manipulating the signal-to-noise envelope power ratio on speech intelligibility.

作者信息

Jørgensen Søren, Decorsière Rémi, Dau Torsten

机构信息

Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.

出版信息

J Acoust Soc Am. 2015 Mar;137(3):1401-10. doi: 10.1121/1.4908240.

DOI:10.1121/1.4908240

PMID:25786952

Abstract

Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] suggested a metric for speech intelligibility prediction based on the signal-to-noise envelope power ratio ( SNRenv), calculated at the output of a modulation-frequency selective process. In the framework of the speech-based envelope power spectrum model (sEPSM), the SNRenv was demonstrated to account for speech intelligibility data in various conditions with linearly and nonlinearly processed noisy speech, as well as for conditions with stationary and fluctuating interferers. Here, the relation between the SNRenv and speech intelligibility was investigated further by systematically varying the modulation power of either the speech or the noise before mixing the two components, while keeping the overall power ratio of the two components constant. A good correspondence between the data and the corresponding sEPSM predictions was obtained when the noise was manipulated and mixed with the unprocessed speech, consistent with the hypothesis that SNRenv is indicative of speech intelligibility. However, discrepancies between data and predictions occurred for conditions where the speech was manipulated and the noise left untouched. In these conditions, distortions introduced by the applied modulation processing were detrimental for speech intelligibility, but not reflected in the SNRenv metric, thus representing a limitation of the modeling framework.

摘要

约根森和道[(2011年)。《美国声学学会杂志》130, 1475 - 1487]提出了一种基于信号与噪声包络功率比(SNRenv)的语音可懂度预测指标，该指标在调制频率选择过程的输出端计算得出。在基于语音的包络功率谱模型(sEPSM)框架下，SNRenv被证明能够解释各种条件下的语音可懂度数据，包括线性和非线性处理的噪声语音，以及存在固定和波动干扰源的情况。在此，通过在混合语音和噪声这两个分量之前系统地改变语音或噪声的调制功率，同时保持两个分量的总功率比不变，进一步研究了SNRenv与语音可懂度之间的关系。当对噪声进行处理并与未处理的语音混合时，数据与相应的sEPSM预测之间取得了良好的对应关系，这与SNRenv可指示语音可懂度的假设一致。然而，在对语音进行处理而噪声未作处理的情况下，数据与预测之间出现了差异。在这些情况下，应用的调制处理引入的失真对语音可懂度有害，但未在SNRenv指标中体现，因此代表了该建模框架的一个局限性。