• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

可分离的频谱-时间Gabor滤波器组特征:降低用于自动语音识别的稳健特征的复杂度。

Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.

作者信息

Schädler Marc René, Kollmeier Birger

机构信息

Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, D-26111 Oldenburg, Germany.

出版信息

J Acoust Soc Am. 2015 Apr;137(4):2047-59. doi: 10.1121/1.4916618.

DOI:10.1121/1.4916618
PMID:25920855
Abstract

To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor filter bank. A feature set that is extracted with these separate spectral and temporal modulation filter banks was introduced, the separate Gabor filter bank (SGBFB) features, and evaluated on the CHiME (Computational Hearing in Multisource Environments) keywords-in-noise recognition task. From the perspective of robust ASR, the results showed that spectral and temporal processing can be performed independently and are not required to interact with each other. Using SGBFB features permitted the signal-to-noise ratio (SNR) to be lowered by 1.2 dB while still performing as well as the GBFB-based reference system, which corresponds to a relative improvement of the word error rate by 12.8%. Additionally, the real time factor of the spectro-temporal processing could be reduced by more than an order of magnitude. Compared to human listeners, the SNR needed to be 13 dB higher when using Mel-frequency cepstral coefficient features, 11 dB higher when using GBFB features, and 9 dB higher when using SGBFB features to achieve the same recognition performance.

摘要

为了测试自动语音识别(ASR)是否需要同时进行频谱和时间处理来提取稳健特征,我们将Schädler、Meyer和Kollmeier [《美国声学学会杂志》131, 4134 - 4151 (2012)]提出的稳健的频谱 - 时间二维伽柏滤波器组(GBFB)前端分解为一个频谱一维伽柏滤波器组和一个时间一维伽柏滤波器组。我们引入了用这些单独的频谱和时间调制滤波器组提取的特征集,即单独的伽柏滤波器组(SGBFB)特征,并在CHiME(多源环境中的计算听觉)噪声中的关键词识别任务上进行了评估。从稳健ASR的角度来看,结果表明频谱和时间处理可以独立进行,无需相互作用。使用SGBFB特征可将信噪比(SNR)降低1.2 dB,同时性能仍与基于GBFB的参考系统相当,这对应于单词错误率相对提高12.8%。此外,频谱 - 时间处理的实时因子可降低一个多数量级。与人类听众相比,使用梅尔频率倒谱系数特征时,要达到相同的识别性能,所需的SNR要高13 dB;使用GBFB特征时要高11 dB;使用SGBFB特征时要高9 dB。

相似文献

1
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.可分离的频谱-时间Gabor滤波器组特征:降低用于自动语音识别的稳健特征的复杂度。
J Acoust Soc Am. 2015 Apr;137(4):2047-59. doi: 10.1121/1.4916618.
2
Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.用于鲁棒自动语音识别的时频谱调制子空间扩展滤波器组特征。
J Acoust Soc Am. 2012 May;131(5):4134-51. doi: 10.1121/1.3699200.
3
A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception.用于听觉辨别实验的模拟框架:揭示跨频率处理在语音感知中的重要性。
J Acoust Soc Am. 2016 May;139(5):2708. doi: 10.1121/1.4948772.
4
Spectro-temporal modulation energy based mask for robust speaker identification.基于谱时调制能量的掩蔽稳健说话人识别。
J Acoust Soc Am. 2012 May;131(5):EL368-74. doi: 10.1121/1.3697534.
5
Do we need STRFs for cocktail parties? On the relevance of physiologically motivated features for human speech perception derived from automatic speech recognition.我们在鸡尾酒会上需要 STRFs 吗?关于自动语音识别中提取的基于生理学的特征对人类语音感知的相关性。
Adv Exp Med Biol. 2013;787:333-41. doi: 10.1007/978-1-4614-1590-9_37.
6
Word recognition for temporally and spectrally distorted materials: the effects of age and hearing loss.语音识别对时间和频谱失真材料的影响:年龄和听力损失的作用。
Ear Hear. 2012 May-Jun;33(3):349-66. doi: 10.1097/AUD.0b013e318242571c.
7
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.语音固有变化对人类和自动语音音位识别的影响。
J Acoust Soc Am. 2011 Jan;129(1):388-403. doi: 10.1121/1.3514525.
8
Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering.基于频谱-时间调制滤波的用于噪声语音的听觉激励前端。
J Acoust Soc Am. 2014 Nov;136(5):EL343-9. doi: 10.1121/1.4896406.
9
Characteristics of spectro-temporal modulation frequency selectivity in humans.人类光谱-时间调制频率选择性的特征
J Acoust Soc Am. 2017 Mar;141(3):1887. doi: 10.1121/1.4976537.
10
Nonlinear spectro-temporal features based on a cochlear model for automatic speech recognition in a noisy situation.基于耳蜗模型的非线性谱时特征在噪声环境下的自动语音识别。
Neural Netw. 2013 Sep;45:62-9. doi: 10.1016/j.neunet.2013.02.006. Epub 2013 Mar 7.

引用本文的文献

1
Spontaneous emergence of rudimentary music detectors in deep neural networks.深度神经网络中原始音乐探测器的自发出现。
Nat Commun. 2024 Jan 2;15(1):148. doi: 10.1038/s41467-023-44516-0.
2
Spectro-temporal modulation glimpsing for speech intelligibility prediction.声谱时变调制窥视用于语音可懂度预测。
Hear Res. 2022 Dec;426:108620. doi: 10.1016/j.heares.2022.108620. Epub 2022 Sep 21.
3
Attention Differentially Affects Acoustic and Phonetic Feature Encoding in a Multispeaker Environment.注意在多说话人环境中对声学和语音特征编码的影响不同。
J Neurosci. 2022 Jan 26;42(4):682-691. doi: 10.1523/JNEUROSCI.1455-20.2021. Epub 2021 Dec 10.
4
Speech Intelligibility Prediction using Spectro-Temporal Modulation Analysis.基于频谱-时间调制分析的语音可懂度预测
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:210-225. doi: 10.1109/taslp.2020.3039929. Epub 2020 Nov 24.
5
Time-frequency scattering accurately models auditory similarities between instrumental playing techniques.时频散射能准确地模拟乐器演奏技巧之间的听觉相似性。
EURASIP J Audio Speech Music Process. 2021;2021(1):3. doi: 10.1186/s13636-020-00187-z. Epub 2021 Jan 11.
6
Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.使用机器学习对不同听众群体的助听器效果进行预测:双耳降噪算法的言语识别性能。
Trends Hear. 2018 Jan-Dec;22:2331216518768954. doi: 10.1177/2331216518768954.
7
Matching Pursuit Analysis of Auditory Receptive Fields' Spectro-Temporal Properties.听觉感受野的频谱-时间特性的匹配追踪分析
Front Syst Neurosci. 2017 Feb 9;11:4. doi: 10.3389/fnsys.2017.00004. eCollection 2017.