Yu Chengzhu, Wójcicki Kamil K, Loizou Philipos C, Hansen John H L, Johnson Michael T
Department of Electrical Engineering, Erik Jonsson School of Enigneering and Computer Science, University of Texas at Dallas, Richardson, Texas 75083.
Speech and Signal Processing Laboratory, Marquette University, 1515 West Wisconsin Avenue, Milwaukee, Wisconsin 53201-1881.
J Acoust Soc Am. 2014 May;135(5):3007-16. doi: 10.1121/1.4869088.
Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures.
近期关于二元掩蔽技术的研究假设,每个时频(T-F)单元对语音的整体可懂度贡献相等。本研究表明,每个T-F单元对语音可懂度的重要性会根据语音内容而变化。具体而言,T-F单元被分为两类,即有语音的T-F单元和无语音的T-F单元。结果表明,每个有语音的T-F单元对语音可懂度的重要性与其目标成分的响度高度相关,而每个无语音的T-F单元的重要性则根据其掩蔽成分的响度而变化。还考虑了两种类型的掩蔽错误,即漏报和误报错误。与先前的研究一致,当混合信号噪声比(SNR)低于0 dB时,误报错误对语音可懂度的危害比漏报错误更大。然而,这两种错误类型之间的相对重要性取决于输入语音信号的SNR水平。基于这些观察结果,提出了一种基于掩蔽的客观度量,即响度加权命中-错误率,用于预测语音可懂度。与现有的两种基于掩蔽的客观度量相比,所提出的客观度量与可懂度的相关性显著更高。