Lu Yang, Loizou Philipos C
Department of Electrical Engineering, the University of Texas at Dallas, Richardson, TX, 75080, USA.
IEEE Trans Audio Speech Lang Process. 2011 Jul 1;19(5):1123-1137. doi: 10.1109/TASL.2010.2082531.
Statistical estimators of the magnitude-squared spectrum are derived based on the assumption that the magnitude-squared spectrum of the noisy speech signal can be computed as the sum of the (clean) signal and noise magnitude-squared spectra. Maximum a posterior (MAP) and minimum mean square error (MMSE) estimators are derived based on a Gaussian statistical model. The gain function of the MAP estimator was found to be identical to the gain function used in the ideal binary mask (IdBM) that is widely used in computational auditory scene analysis (CASA). As such, it was binary and assumed the value of 1 if the local SNR exceeded 0 dB, and assumed the value of 0 otherwise. By modeling the local instantaneous SNR as an F-distributed random variable, soft masking methods were derived incorporating SNR uncertainty. The soft masking method, in particular, which weighted the noisy magnitude-squared spectrum by the a priori probability that the local SNR exceeds 0 dB was shown to be identical to the Wiener gain function. Results indicated that the proposed estimators yielded significantly better speech quality than the conventional MMSE spectral power estimators, in terms of yielding lower residual noise and lower speech distortion.
基于噪声语音信号的幅度平方谱可计算为(纯净)信号和噪声幅度平方谱之和这一假设,推导了幅度平方谱的统计估计器。基于高斯统计模型推导了最大后验(MAP)估计器和最小均方误差(MMSE)估计器。发现MAP估计器的增益函数与计算听觉场景分析(CASA)中广泛使用的理想二元掩蔽(IdBM)中使用的增益函数相同。因此,它是二元的,当局部信噪比超过0 dB时取值为1,否则取值为0。通过将局部瞬时信噪比建模为F分布随机变量,推导了包含信噪比不确定性的软掩蔽方法。特别是,通过局部信噪比超过0 dB的先验概率对噪声幅度平方谱进行加权的软掩蔽方法被证明与维纳增益函数相同。结果表明,就产生更低的残余噪声和更低的语音失真而言,所提出的估计器产生的语音质量明显优于传统的MMSE谱功率估计器。