• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于稳健语音活动检测的反馈驱动感官映射自适应

Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.

作者信息

Bellur Ashwin, Elhilali Mounya

机构信息

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2017 Mar;25(3):481-492. doi: 10.1109/TASLP.2016.2639322. Epub 2016 Dec 13.

DOI:10.1109/TASLP.2016.2639322
PMID:28736736
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5516649/
Abstract

Parsing natural acoustic scenes using computational methodologies poses many challenges. Given the rich and complex nature of the acoustic environment, data mismatch between train and test conditions is a major hurdle in data-driven audio processing systems. In contrast, the brain exhibits a remarkable ability at segmenting acoustic scenes with relative ease. When tackling challenging listening conditions that are often faced in everyday life, the biological system relies on a number of principles that allow it to effortlessly parse its rich soundscape. In the current study, we leverage a key principle employed by the auditory system: its ability to adapt the neural representation of its sensory input in a high-dimensional space. We propose a framework that mimics this process in a computational model for robust speech activity detection. The system employs a 2-D Gabor filter bank whose parameters are retuned offline to improve the separability between the feature representation of speech and nonspeech sounds. This retuning process, driven by feedback from statistical models of speech and nonspeech classes, attempts to minimize the misclassification risk of mismatched data, with respect to the original statistical models. We hypothesize that this risk minimization procedure results in an emphasis of unique speech and nonspeech modulations in the high-dimensional space. We show that such an adapted system is indeed robust to other novel conditions, with a marked reduction in equal error rates for a variety of databases with additive and convolutive noise distortions. We discuss the lessons learned from biology with regard to adapting to an ever-changing acoustic environment and the impact on building truly intelligent audio processing systems.

摘要

使用计算方法解析自然声学场景面临诸多挑战。鉴于声学环境丰富且复杂的特性,训练和测试条件之间的数据不匹配是数据驱动音频处理系统中的一个主要障碍。相比之下,大脑在相对轻松地分割声学场景方面展现出非凡能力。在应对日常生活中经常遇到的具有挑战性的聆听条件时,生物系统依赖于一些原则,使其能够毫不费力地解析其丰富的音景。在当前研究中,我们利用了听觉系统所采用的一个关键原则:其在高维空间中调整感觉输入神经表征的能力。我们提出了一个框架,在用于稳健语音活动检测的计算模型中模仿这一过程。该系统采用二维伽柏滤波器组,其参数离线重新调整,以提高语音和非语音声音特征表示之间的可分离性。这个重新调整过程由语音和非语音类别的统计模型反馈驱动,试图相对于原始统计模型将不匹配数据的误分类风险降至最低。我们假设这种风险最小化过程会导致在高维空间中突出独特的语音和非语音调制。我们表明,这样一个经过调整的系统确实对其他新条件具有鲁棒性,对于各种具有加性和卷积噪声失真的数据库,其等错误率显著降低。我们讨论了从生物学中学到的关于适应不断变化的声学环境的经验教训以及对构建真正智能音频处理系统的影响。

相似文献

1
Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection.用于稳健语音活动检测的反馈驱动感官映射自适应
IEEE/ACM Trans Audio Speech Lang Process. 2017 Mar;25(3):481-492. doi: 10.1109/TASLP.2016.2639322. Epub 2016 Dec 13.
2
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.一种使用自适应听觉感受野的语音活动检测框架。
IEEE/ACM Trans Audio Speech Lang Process. 2015 Dec;23(12):2422-2433. doi: 10.1109/TASLP.2015.2481179. Epub 2015 Sep 23.
3
Noise-robust cortical tracking of attended speech in real-world acoustic scenes.在真实声学场景中对注意到的语音进行抗噪皮层追踪。
Neuroimage. 2017 Aug 1;156:435-444. doi: 10.1016/j.neuroimage.2017.04.026. Epub 2017 Apr 13.
4
Discrimination of speech and of complex nonspeech sounds of different temporal structure in the left and right cerebral hemispheres.左右大脑半球对语音及不同时间结构的复杂非语音声音的辨别。
Neuroimage. 2000 Dec;12(6):657-63. doi: 10.1006/nimg.2000.0646.
5
Neurophysiological indices of speech and nonspeech stimulus processing.言语和非言语刺激处理的神经生理指标。
J Speech Lang Hear Res. 2005 Oct;48(5):1147-64. doi: 10.1044/1092-4388(2005/081).
6
Engineering Aspects of Olfaction嗅觉的工程学方面
7
Eye Can Hear Clearly Now: Inverse Effectiveness in Natural Audiovisual Speech Processing Relies on Long-Term Crossmodal Temporal Integration.现在眼睛能“听清”了:自然视听言语处理中的反向有效性依赖于长期跨模态时间整合。
J Neurosci. 2016 Sep 21;36(38):9888-95. doi: 10.1523/JNEUROSCI.1396-16.2016.
8
Time course of early audiovisual interactions during speech and nonspeech central auditory processing: a magnetoencephalography study.言语和非言语中枢听觉处理过程中早期视听交互的时间进程:一项脑磁图研究。
J Cogn Neurosci. 2009 Feb;21(2):259-74. doi: 10.1162/jocn.2008.21019.
9
The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm).人类大脑对语音的感知,由失配负波(MMN)及其磁等效物(MMNm)反映出来。
Psychophysiology. 2001 Jan;38(1):1-21. doi: 10.1017/s0048577201000208.
10
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.可分离的频谱-时间Gabor滤波器组特征:降低用于自动语音识别的稳健特征的复杂度。
J Acoust Soc Am. 2015 Apr;137(4):2047-59. doi: 10.1121/1.4916618.

引用本文的文献

1
Audio object classification using distributed beliefs and attention.基于分布式信念和注意力的音频对象分类
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:729-739. doi: 10.1109/taslp.2020.2966867. Epub 2020 Jan 15.
2
Recent advances in exploring the neural underpinnings of auditory scene perception.探索听觉场景感知神经基础的最新进展。
Ann N Y Acad Sci. 2017 May;1396(1):39-55. doi: 10.1111/nyas.13317. Epub 2017 Feb 15.
3
Modelling auditory attention.模拟听觉注意力。

本文引用的文献

1
Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition.识别信息与传递者:用于可靠语音和说话人识别的仿生光谱分析
Int J Speech Technol. 2013;16(3):313-322. doi: 10.1007/s10772-012-9184-y. Epub 2012 Dec 18.
2
Object recognition with hierarchical discriminant saliency networks.基于分层判别显著网络的目标识别。
Front Comput Neurosci. 2014 Sep 9;8:109. doi: 10.3389/fncom.2014.00109. eCollection 2014.
3
Rapid spectrotemporal plasticity in primary auditory cortex during behavior.
Philos Trans R Soc Lond B Biol Sci. 2017 Feb 19;372(1714). doi: 10.1098/rstb.2016.0101. Epub 2017 Jan 2.
行为过程中初级听觉皮层的快速时频谱可塑性。
J Neurosci. 2014 Mar 19;34(12):4396-408. doi: 10.1523/JNEUROSCI.2799-13.2014.
4
Adaptive auditory computations.自适应听觉计算。
Curr Opin Neurobiol. 2014 Apr;25:164-8. doi: 10.1016/j.conb.2014.01.011. Epub 2014 Feb 11.
5
Top-down feedback in an HMAX-like cortical model of object perception based on hierarchical Bayesian networks and belief propagation.基于分层贝叶斯网络和置信传播的对象感知 HMAX 样皮质模型中的自上而下反馈。
PLoS One. 2012;7(11):e48216. doi: 10.1371/journal.pone.0048216. Epub 2012 Nov 5.
6
Music in our ears: the biological bases of musical timbre perception.音乐在我们耳边:音乐音色感知的生物学基础。
PLoS Comput Biol. 2012;8(11):e1002759. doi: 10.1371/journal.pcbi.1002759. Epub 2012 Nov 1.
7
Adaptive, behaviorally gated, persistent encoding of task-relevant auditory information in ferret frontal cortex.雪貂额皮质中与任务相关听觉信息的适应性、行为门控、持久编码。
Nat Neurosci. 2010 Aug;13(8):1011-9. doi: 10.1038/nn.2598. Epub 2010 Jul 11.
8
An object-based visual attention model for robotic applications.一种用于机器人应用的基于对象的视觉注意力模型。
IEEE Trans Syst Man Cybern B Cybern. 2010 Oct;40(5):1398-412. doi: 10.1109/TSMCB.2009.2038895. Epub 2010 Feb 2.
9
Auditory attentional control and selection during cocktail party listening.鸡尾酒会聆听中的听觉注意力控制和选择。
Cereb Cortex. 2010 Mar;20(3):583-90. doi: 10.1093/cercor/bhp124. Epub 2009 Jul 2.
10
Object-based auditory and visual attention.基于对象的听觉和视觉注意力。
Trends Cogn Sci. 2008 May;12(5):182-6. doi: 10.1016/j.tics.2008.02.003. Epub 2008 Apr 7.