一种使用自适应听觉感受野的语音活动检测框架。

A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.

作者信息

Carlin Michael A, Elhilali Mounya

机构信息

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2015 Dec;23(12):2422-2433. doi: 10.1109/TASLP.2015.2481179. Epub 2015 Sep 23.

DOI:10.1109/TASLP.2015.2481179

PMID:29904642

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5997283/

Abstract

One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.

摘要

大脑中声音处理的一个标志是神经系统能够适应不断变化的行为需求和周围的声景。它可以动态地转移感官和认知资源，以专注于相关声音。神经生理学研究表明，这种能力是通过自适应地调整皮层频谱-时间感受野（STRF）的形状来支持的，以增强目标声音的特征，同时抑制与任务无关的干扰声音的特征。由于人类交流的一个重要组成部分是听众在嘈杂环境中动态跟踪语音的能力，听觉神经生理学得出的解决方案意味着一种用于语音活动检测（SAD）的有用适应策略。SAD是许多自动语音处理系统中的重要第一步，并且在高噪声环境中性能通常会降低。在本文中，我们描述了如何在一组神经生理学STRF中诱导任务驱动的适应，并展示了适应语音的STRF如何重新定向自身，以增强语音的频谱-时间调制，同时抑制与各种非语音声音相关的调制。然后，我们展示了与未适应的组和抗噪声基线相比，适应的STRF组如何在未见的嘈杂环境中更好地检测语音。最后，我们使用刺激重建任务来证明适应的STRF组如何在干净和嘈杂条件下更好地捕捉被关注语音的频谱-时间调制。我们的结果表明，一个生物学上合理的适应框架可以应用于语音处理系统，以动态地调整特征表示，从而提高抗噪声能力。

相似文献

A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.一种使用自适应听觉感受野的语音活动检测框架。

IEEE/ACM Trans Audio Speech Lang Process. 2015 Dec;23(12):2422-2433. doi: 10.1109/TASLP.2015.2481179. Epub 2015 Sep 23.

Modeling attention-driven plasticity in auditory cortical receptive fields.模拟听觉皮层感受野中注意力驱动的可塑性。

Front Comput Neurosci. 2015 Aug 19;9:106. doi: 10.3389/fncom.2015.00106. eCollection 2015.

Plasticity of Multidimensional Receptive Fields in Core Rat Auditory Cortex Directed by Sound Statistics.声音统计信息对核心大鼠听觉皮层多维感受野可塑性的调控

Neuroscience. 2021 Jul 15;467:150-170. doi: 10.1016/j.neuroscience.2021.04.028. Epub 2021 May 2.

Matching Pursuit Analysis of Auditory Receptive Fields' Spectro-Temporal Properties.听觉感受野的频谱-时间特性的匹配追踪分析

Front Syst Neurosci. 2017 Feb 9;11:4. doi: 10.3389/fnsys.2017.00004. eCollection 2017.

Hierarchy of speech-driven spectrotemporal receptive fields in human auditory cortex.人类听觉皮层中语音驱动的spectrotemporal 感受野的层次结构。

Neuroimage. 2019 Feb 1;186:647-666. doi: 10.1016/j.neuroimage.2018.11.049. Epub 2018 Nov 28.

Do we need STRFs for cocktail parties? On the relevance of physiologically motivated features for human speech perception derived from automatic speech recognition.我们在鸡尾酒会上需要 STRFs 吗？关于自动语音识别中提取的基于生理学的特征对人类语音感知的相关性。

Adv Exp Med Biol. 2013;787:333-41. doi: 10.1007/978-1-4614-1590-9_37.

Stimulus-specific effects of noradrenaline in auditory cortex: implications for the discrimination of communication sounds.去甲肾上腺素在听觉皮层的刺激特异性效应：对交流声音辨别能力的影响。

J Physiol. 2015 Feb 15;593(4):1003-20. doi: 10.1113/jphysiol.2014.282855. Epub 2014 Dec 18.

Learning spectro-temporal representations of complex sounds with parameterized neural networks.用参数化神经网络学习复杂声音的时频谱表示。

J Acoust Soc Am. 2021 Jul;150(1):353. doi: 10.1121/10.0005482.

Spectro-temporal modulation detection and its relation to speech perception in children with auditory processing disorder.光谱-时间调制检测及其与听觉处理障碍儿童言语感知的关系。

Int J Pediatr Otorhinolaryngol. 2020 Apr;131:109860. doi: 10.1016/j.ijporl.2020.109860. Epub 2020 Jan 3.

Understanding auditory spectro-temporal receptive fields and their changes with input statistics by efficient coding principles.理解听觉频谱-时间感受野及其通过有效编码原理随输入统计数据的变化。

PLoS Comput Biol. 2011 Aug;7(8):e1002123. doi: 10.1371/journal.pcbi.1002123. Epub 2011 Aug 18.

引用本文的文献

Modelling auditory attention.模拟听觉注意力。

Philos Trans R Soc Lond B Biol Sci. 2017 Feb 19;372(1714). doi: 10.1098/rstb.2016.0101. Epub 2017 Jan 2.

本文引用的文献

Biomimetic multi-resolution analysis for robust speaker recognition.用于稳健说话人识别的仿生多分辨率分析

EURASIP J Audio Speech Music Process. 2012;2012. doi: 10.1186/1687-4722-2012-22. Epub 2012 Sep 7.

A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition.一种基于带通调制滤波的多流特征框架用于鲁棒语音识别。

IEEE Trans Audio Speech Lang Process. 2013 Feb;21(2):416-426. doi: 10.1109/TASL.2012.2219526. Epub 2012 Sep 18.

Modeling attention-driven plasticity in auditory cortical receptive fields.模拟听觉皮层感受野中注意力驱动的可塑性。

Front Comput Neurosci. 2015 Aug 19;9:106. doi: 10.3389/fncom.2015.00106. eCollection 2015.

Mechanisms of noise robust representation of speech in primary auditory cortex.初级听觉皮层中语音抗噪表示的机制。

Proc Natl Acad Sci U S A. 2014 May 6;111(18):6792-7. doi: 10.1073/pnas.1318017111. Epub 2014 Apr 21.

Rapid spectrotemporal plasticity in primary auditory cortex during behavior.行为过程中初级听觉皮层的快速时频谱可塑性。

J Neurosci. 2014 Mar 19;34(12):4396-408. doi: 10.1523/JNEUROSCI.2799-13.2014.

Adaptive auditory computations.自适应听觉计算。

Curr Opin Neurobiol. 2014 Apr;25:164-8. doi: 10.1016/j.conb.2014.01.011. Epub 2014 Feb 11.

Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG.鸡尾酒会环境中的注意力选择可从单次试验脑电图中解码出来。

Cereb Cortex. 2015 Jul;25(7):1697-706. doi: 10.1093/cercor/bht355. Epub 2014 Jan 15.

The what, where and how of auditory-object perception.听觉对象感知的要素、场所与方式。

Nat Rev Neurosci. 2013 Oct;14(10):693-707. doi: 10.1038/nrn3565.

Sustained firing of model central auditory neurons yields a discriminative spectro-temporal representation for natural sounds.模型中枢听觉神经元的持续放电为自然声音产生了可区分的谱时表示。

PLoS Comput Biol. 2013;9(3):e1002982. doi: 10.1371/journal.pcbi.1002982. Epub 2013 Mar 28.

Adult visual cortical plasticity.成人视觉皮层可塑性。

Neuron. 2012 Jul 26;75(2):250-64. doi: 10.1016/j.neuron.2012.06.030.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验