• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过听觉特征检测和尖峰序列解码实现抗噪语音识别。

Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

机构信息

Department of Physics and Center for Neural Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A.

出版信息

Neural Comput. 2014 Mar;26(3):523-56. doi: 10.1162/NECO_a_00557. Epub 2013 Dec 9.

DOI:10.1162/NECO_a_00557
PMID:24320849
Abstract

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

摘要

在嘈杂环境下的语音识别对计算机系统来说是一个重大挑战,但人类大脑却能常规且准确地完成这项任务。受神经科学启发的自动语音识别(ASR)系统有可能弥合人类和机器之间的性能差距。我们提出了一种针对噪声鲁棒的孤立单词识别系统,该系统通过对模拟听觉特征检测神经元群体的尖峰序列进行解码来工作。每个神经元都经过训练,对从模拟听觉神经对语音的反应中提取的短暂的时频谱模式或特征做出选择性响应。神经元群体通过其尖峰序列来传递声音的时变结构。我们比较了两种解码尖峰序列的方法——一种使用基于隐马尔可夫模型的识别器,另一种使用基于新模板的识别方案。在后一种情况下,通过将其尖峰序列与从干净训练数据中获得的模板序列进行比较,使用基于最长公共子序列长度的相似性度量来识别单词。我们使用 AURORA-2 数据库中的孤立数字语音来证明,我们的组合系统在低信噪比下优于最先进的鲁棒语音识别器。基于尖峰的编码方案和基于模板的解码方法都比传统的语音识别方法具有更好的噪声鲁棒性。我们的系统突出了基于尖峰的声学编码的潜在优势,并为鲁棒 ASR 的发展提供了一个基于生物学的框架。

相似文献

1
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.通过听觉特征检测和尖峰序列解码实现抗噪语音识别。
Neural Comput. 2014 Mar;26(3):523-56. doi: 10.1162/NECO_a_00557. Epub 2013 Dec 9.
2
The robustness of speech representations obtained from simulated auditory nerve fibers under different noise conditions.在不同噪声条件下,从模拟听神经纤维中获得的语音表示的鲁棒性。
J Acoust Soc Am. 2013 Sep;134(3):EL282-8. doi: 10.1121/1.4817912.
3
Toward optimizing stream fusion in multistream recognition of speech.针对语音多流识别中的流融合优化。
J Acoust Soc Am. 2011 Jul;130(1):EL14-8. doi: 10.1121/1.3595744.
4
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.语音固有变化对人类和自动语音音位识别的影响。
J Acoust Soc Am. 2011 Jan;129(1):388-403. doi: 10.1121/1.3514525.
5
Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.用于鲁棒自动语音识别的时频谱调制子空间扩展滤波器组特征。
J Acoust Soc Am. 2012 May;131(5):4134-51. doi: 10.1121/1.3699200.
6
Modeling the temporal dynamics of distinctive feature landmark detectors for speech recognition.为语音识别建模独特特征界标检测器的时间动态。
J Acoust Soc Am. 2008 Sep;124(3):1739-58. doi: 10.1121/1.2956472.
7
Multiexpert automatic speech recognition using acoustic and myoelectric signals.使用声学和肌电信号的多专家自动语音识别
IEEE Trans Biomed Eng. 2006 Apr;53(4):676-85. doi: 10.1109/TBME.2006.870224.
8
Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals.使用构音障碍(失真)语音信号的倒谱分析对隐马尔可夫模型/人工神经网络混合结构在模式识别应用中的研究。
Med Eng Phys. 2006 Oct;28(8):741-8. doi: 10.1016/j.medengphy.2005.11.002. Epub 2005 Dec 15.
9
A computer model of auditory efferent suppression: implications for the recognition of speech in noise.听觉传出抑制的计算机模型:对噪声中语音识别的影响。
J Acoust Soc Am. 2010 Feb;127(2):943-54. doi: 10.1121/1.3273893.
10
Automatic speech recognition using a predictive echo state network classifier.使用预测回声状态网络分类器的自动语音识别。
Neural Netw. 2007 Apr;20(3):414-23. doi: 10.1016/j.neunet.2007.04.006. Epub 2007 Apr 29.

引用本文的文献

1
Stochastic modeling of central apnea events in preterm infants.早产儿中枢性呼吸暂停事件的随机建模
Physiol Meas. 2016 Apr;37(4):463-84. doi: 10.1088/0967-3334/37/4/463. Epub 2016 Mar 10.