Suppr超能文献

一种使用自适应听觉感受野的语音活动检测框架。

A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.

作者信息

Carlin Michael A, Elhilali Mounya

机构信息

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2015 Dec;23(12):2422-2433. doi: 10.1109/TASLP.2015.2481179. Epub 2015 Sep 23.

Abstract

One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.

摘要

大脑中声音处理的一个标志是神经系统能够适应不断变化的行为需求和周围的声景。它可以动态地转移感官和认知资源,以专注于相关声音。神经生理学研究表明,这种能力是通过自适应地调整皮层频谱-时间感受野(STRF)的形状来支持的,以增强目标声音的特征,同时抑制与任务无关的干扰声音的特征。由于人类交流的一个重要组成部分是听众在嘈杂环境中动态跟踪语音的能力,听觉神经生理学得出的解决方案意味着一种用于语音活动检测(SAD)的有用适应策略。SAD是许多自动语音处理系统中的重要第一步,并且在高噪声环境中性能通常会降低。在本文中,我们描述了如何在一组神经生理学STRF中诱导任务驱动的适应,并展示了适应语音的STRF如何重新定向自身,以增强语音的频谱-时间调制,同时抑制与各种非语音声音相关的调制。然后,我们展示了与未适应的组和抗噪声基线相比,适应的STRF组如何在未见的嘈杂环境中更好地检测语音。最后,我们使用刺激重建任务来证明适应的STRF组如何在干净和嘈杂条件下更好地捕捉被关注语音的频谱-时间调制。我们的结果表明,一个生物学上合理的适应框架可以应用于语音处理系统,以动态地调整特征表示,从而提高抗噪声能力。

相似文献

1
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.一种使用自适应听觉感受野的语音活动检测框架。
IEEE/ACM Trans Audio Speech Lang Process. 2015 Dec;23(12):2422-2433. doi: 10.1109/TASLP.2015.2481179. Epub 2015 Sep 23.
2
Modeling attention-driven plasticity in auditory cortical receptive fields.模拟听觉皮层感受野中注意力驱动的可塑性。
Front Comput Neurosci. 2015 Aug 19;9:106. doi: 10.3389/fncom.2015.00106. eCollection 2015.

引用本文的文献

1
Modelling auditory attention.模拟听觉注意力。
Philos Trans R Soc Lond B Biol Sci. 2017 Feb 19;372(1714). doi: 10.1098/rstb.2016.0101. Epub 2017 Jan 2.

本文引用的文献

1
Biomimetic multi-resolution analysis for robust speaker recognition.用于稳健说话人识别的仿生多分辨率分析
EURASIP J Audio Speech Music Process. 2012;2012. doi: 10.1186/1687-4722-2012-22. Epub 2012 Sep 7.
3
Modeling attention-driven plasticity in auditory cortical receptive fields.模拟听觉皮层感受野中注意力驱动的可塑性。
Front Comput Neurosci. 2015 Aug 19;9:106. doi: 10.3389/fncom.2015.00106. eCollection 2015.
4
Mechanisms of noise robust representation of speech in primary auditory cortex.初级听觉皮层中语音抗噪表示的机制。
Proc Natl Acad Sci U S A. 2014 May 6;111(18):6792-7. doi: 10.1073/pnas.1318017111. Epub 2014 Apr 21.
6
Adaptive auditory computations.自适应听觉计算。
Curr Opin Neurobiol. 2014 Apr;25:164-8. doi: 10.1016/j.conb.2014.01.011. Epub 2014 Feb 11.
10
Adult visual cortical plasticity.成人视觉皮层可塑性。
Neuron. 2012 Jul 26;75(2):250-64. doi: 10.1016/j.neuron.2012.06.030.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验