可分离的频谱-时间Gabor滤波器组特征：降低用于自动语音识别的稳健特征的复杂度。

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

作者信息

Schädler Marc René, Kollmeier Birger

机构信息

Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany.

出版信息

J Acoust Soc Am. 2015 Apr;137(4):2047-59. doi: 10.1121/1.4916618.

DOI:10.1121/1.4916618

PMID:25920855

Abstract

To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.

摘要

为了测试自动语音识别（ASR）是否需要同时进行频谱和时间处理来提取稳健特征，我们将Schädler、Meyer和Kollmeier [《美国声学学会杂志》131, 4134 - 4151 (2012)]提出的稳健的频谱 - 时间二维伽柏滤波器组（GBFB）前端分解为一个频谱一维伽柏滤波器组和一个时间一维伽柏滤波器组。我们引入了用这些单独的频谱和时间调制滤波器组提取的特征集，即单独的伽柏滤波器组（SGBFB）特征，并在CHiME（多源环境中的计算听觉）噪声中的关键词识别任务上进行了评估。从稳健ASR的角度来看，结果表明频谱和时间处理可以独立进行，无需相互作用。使用SGBFB特征可将信噪比（SNR）降低1.2 dB，同时性能仍与基于GBFB的参考系统相当，这对应于单词错误率相对提高12.8%。此外，频谱 - 时间处理的实时因子可降低一个多数量级。与人类听众相比，使用梅尔频率倒谱系数特征时，要达到相同的识别性能，所需的SNR要高13 dB；使用GBFB特征时要高11 dB；使用SGBFB特征时要高9 dB。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

可分离的频谱-时间Gabor滤波器组特征：降低用于自动语音识别的稳健特征的复杂度。

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

可分离的频谱-时间Gabor滤波器组特征：降低用于自动语音识别的稳健特征的复杂度。

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献