Suppr超能文献

我们在鸡尾酒会上需要 STRFs 吗?关于自动语音识别中提取的基于生理学的特征对人类语音感知的相关性。

Do we need STRFs for cocktail parties? On the relevance of physiologically motivated features for human speech perception derived from automatic speech recognition.

机构信息

Medizinische Physik, Carl von Ossietzky University, Oldenburg, D-26111, Germany.

出版信息

Adv Exp Med Biol. 2013;787:333-41. doi: 10.1007/978-1-4614-1590-9_37.

Abstract

Complex auditory features such as spectro-temporal receptive fields (STRFs) derived from the cortical auditory neurons appear to be advantageous in sound processing. However, their physiological and functional relevance is still unclear. To assess the utility of such feature processing for speech reception in noise, automatic speech recognition (ASR) performance using feature sets obtained from physiological and/or psychoacoustical data and models is compared to human performance. Time-frequency representations with a nonlinear compression are compared with standard features such as mel-scaled spectrograms. Both alternatives serve as an input to model estimators that infer spectro-temporal filters (and subsequent nonlinearity) from physiological measurements in auditory brain areas of zebra finches. Alternatively, a filter bank of 2-dimensional Gabor functions is employed, which covers a wide range of modulation frequencies in the time and frequency domain. The results indicate a clear increase in ASR robustness using complex features (modeled by Gabor functions), while the benefit from physiologically derived STRFs is limited. In all cases, the use of power-normalized spectral representations increases performance, indicating that substantial dynamic compression is advantageous for level-independent pattern recognition. The methods employed may help physiologists to look for more relevant STRFs and to better understand specific differences in estimated STRFs.

摘要

复杂的听觉特征,如皮质听觉神经元的时频谱响应域(STRFs),在声音处理中似乎具有优势。然而,它们的生理和功能相关性尚不清楚。为了评估这种特征处理在噪声中语音接收的效用,使用来自生理和/或心理声学数据和模型的特征集进行自动语音识别(ASR)性能与人类性能进行比较。具有非线性压缩的时频表示与梅尔标度频谱图等标准特征进行比较。这两种选择都可以作为模型估计器的输入,从斑马雀听觉脑区的生理测量中推断出时频谱滤波器(和后续的非线性)。或者,可以使用二维 Gabor 函数的滤波器组,该滤波器组在时间和频域中覆盖了广泛的调制频率。结果表明,使用复杂特征(由 Gabor 函数建模)可明显提高 ASR 的稳健性,而生理衍生的 STRFs 的益处有限。在所有情况下,使用功率归一化谱表示都可以提高性能,这表明对于独立于水平的模式识别,大量的动态压缩是有利的。所采用的方法可以帮助生理学家寻找更相关的 STRFs,并更好地理解估计的 STRFs 的具体差异。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验