Agrawal Purvi, Ganapathy Sriram
Indian Institute of Science, Bangalore, India.
J Acoust Soc Am. 2017 Sep;142(3):1686. doi: 10.1121/1.5001926.
The modulation filtering approach to robust automatic speech recognition (ASR) is based on enhancing perceptually relevant regions of the modulation spectrum while suppressing the regions susceptible to noise. In this paper, a data-driven unsupervised modulation filter learning scheme is proposed using convolutional restricted Boltzmann machine. The initial filter is learned using the speech spectrogram while subsequent filters are learned using residual spectrograms. The modulation filtered spectrograms are used for ASR experiments on noisy and reverberant speech where these features provide significant improvements over other robust features. Furthermore, the application of the proposed method for semi-supervised learning is investigated.
用于鲁棒自动语音识别(ASR)的调制滤波方法基于增强调制谱中感知相关区域,同时抑制易受噪声影响的区域。本文提出了一种使用卷积受限玻尔兹曼机的数据驱动无监督调制滤波器学习方案。初始滤波器通过语音频谱图学习,后续滤波器通过残差频谱图学习。调制滤波后的频谱图用于有噪声和混响语音的ASR实验,在这些实验中,这些特征比其他鲁棒特征有显著改进。此外,还研究了所提出方法在半监督学习中的应用。