Suh Youngjoo, Kim Hoirin
School of Electrical Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
J Acoust Soc Am. 2018 Feb;143(2):677. doi: 10.1121/1.5022800.
The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.
直方图均衡化方法是一种用于噪声鲁棒自动语音识别的有效特征归一化技术。然而,当测试环境中某些基本条件不满足时,它会出现性能下降的情况。为了弥补原始直方图均衡化方法的这些局限性,已经提出了基于类别的直方图均衡化方法。尽管这种方法在噪声环境下显示出显著的性能提升,但当测试数据不足时,由于过拟合问题,它仍然会出现性能下降的情况。为了解决这个问题,所提出的直方图均衡化技术在测试累积分布函数估计中采用了贝叶斯估计方法。在之前针对Aurora-4任务进行的一项研究中报告称,所提出的方法在基于高斯混合模型-隐马尔可夫模型声学建模的语音识别系统中提供了显著的性能提升。在这项工作中,在所提出的方法在具有深度神经网络-隐马尔可夫模型(DNN-HMM)的语音识别系统中进行了检验,DNN-HMM是当前主流的语音识别方法,在所提出的方法在该系统中也比传统的基于最大似然估计的方法显示出有意义的性能提升。所提出的特征与梅尔频率倒谱系数的融合在DNN-HMM系统中提供了额外的性能提升,否则在干净测试条件下该系统会出现性能下降的情况。