Suppr超能文献

针对英语语音优化的稀疏伽马通信号模型与人类听觉滤波器不匹配。

Sparse gammatone signal model optimized for English speech does not match the human auditory filters.

作者信息

Strahl Stefan, Mertins Alfred

机构信息

International Graduate School for Neurosensory Science and Systems, Carl von Ossietzky University, D-26111 Oldenburg, Germany.

出版信息

Brain Res. 2008 Jul 18;1220:224-33. doi: 10.1016/j.brainres.2007.11.059. Epub 2007 Dec 7.

Abstract

Evidence that neurosensory systems use sparse signal representations as well as improved performance of signal processing algorithms using sparse signal models raised interest in sparse signal coding in the last years. For natural audio signals like speech and environmental sounds, gammatone atoms have been derived as expansion functions that generate a nearly optimal sparse signal model (Smith, E., Lewicki, M., 2006. Efficient auditory coding. Nature 439, 978-982). Furthermore, gammatone functions are established models for the human auditory filters. Thus far, a practical application of a sparse gammatone signal model has been prevented by the fact that deriving the sparsest representation is, in general, computationally intractable. In this paper, we applied an accelerated version of the matching pursuit algorithm for gammatone dictionaries allowing real-time and large data set applications. We show that a sparse signal model in general has advantages in audio coding and that a sparse gammatone signal model encodes speech more efficiently in terms of sparseness than a sparse modified discrete cosine transform (MDCT) signal model. We also show that the optimal gammatone parameters derived for English speech do not match the human auditory filters, suggesting for signal processing applications to derive the parameters individually for each applied signal class instead of using psychometrically derived parameters. For brain research, it means that care should be taken with directly transferring findings of optimality for technical to biological systems.

摘要

近年来,神经感觉系统使用稀疏信号表示的证据以及使用稀疏信号模型的信号处理算法性能的提升引发了人们对稀疏信号编码的兴趣。对于语音和环境声音等自然音频信号,伽马通原子已被推导为生成近乎最优稀疏信号模型的扩展函数(史密斯,E.,莱维基,M.,2006年。高效听觉编码。《自然》439卷,978 - 982页)。此外,伽马通函数是人类听觉滤波器的既定模型。到目前为止,稀疏伽马通信号模型的实际应用受到这样一个事实的阻碍,即一般来说,推导最稀疏表示在计算上是难以处理的。在本文中,我们将匹配追踪算法的加速版本应用于伽马通字典,从而实现实时和大数据集应用。我们表明,一般而言,稀疏信号模型在音频编码方面具有优势,并且稀疏伽马通信号模型在稀疏性方面比稀疏改进离散余弦变换(MDCT)信号模型更有效地编码语音。我们还表明,为英语语音推导的最优伽马通参数与人类听觉滤波器不匹配,这表明在信号处理应用中应针对每个应用的信号类别单独推导参数,而不是使用心理测量推导的参数。对于脑研究而言,这意味着在将技术系统的最优性研究结果直接应用于生物系统时应谨慎行事。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验