一场带有皮层转折的鸡尾酒会：皮层机制如何促进声音分离。

A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation.

作者信息

Elhilali Mounya, Shamma Shihab A

机构信息

Department of Electrical and Computer Engineering, Johns Hopkins University, Barton, Baltimore, Maryland 21218, USA.

出版信息

J Acoust Soc Am. 2008 Dec;124(6):3751-71. doi: 10.1121/1.3001672.

DOI:10.1121/1.3001672

PMID:19206802

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2676630/

Abstract

Sound systems and speech technologies can benefit greatly from a deeper understanding of how the auditory system, and particularly the auditory cortex, is able to parse complex acoustic scenes into meaningful auditory objects and streams under adverse conditions. In the current work, a biologically plausible model of this process is presented, where the role of cortical mechanisms in organizing complex auditory scenes is explored. The model consists of two stages: (i) a feature analysis stage that maps the acoustic input into a multidimensional cortical representation and (ii) an integrative stage that recursively builds up expectations of how streams evolve over time and reconciles its predictions with the incoming sensory input by sorting it into different clusters. This approach yields a robust computational scheme for speaker separation under conditions of speech or music interference. The model can also emulate the archetypal streaming percepts of tonal stimuli that have long been tested in human subjects. The implications of this model are discussed with respect to the physiological correlates of streaming in the cortex as well as the role of attention and other top-down influences in guiding sound organization.

摘要

声音系统和语音技术可以从更深入地理解听觉系统，特别是听觉皮层如何在不利条件下将复杂的声学场景解析为有意义的听觉对象和流中受益匪浅。在当前的工作中，提出了一个关于这个过程的生物学上合理的模型，其中探索了皮层机制在组织复杂听觉场景中的作用。该模型由两个阶段组成：（i）一个特征分析阶段，将声学输入映射到多维皮层表示中；（ii）一个整合阶段，递归地建立关于流如何随时间演变的期望，并通过将传入的感官输入分类到不同的簇中来使其预测与传入的感官输入相协调。这种方法产生了一种在语音或音乐干扰条件下进行说话者分离的强大计算方案。该模型还可以模拟长期以来在人类受试者中测试过的音调刺激的典型流感知。讨论了该模型在皮层中流的生理相关性以及注意力和其他自上而下的影响在引导声音组织中的作用方面的意义。

相似文献

A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation.一场带有皮层转折的鸡尾酒会：皮层机制如何促进声音分离。

J Acoust Soc Am. 2008 Dec;124(6):3751-71. doi: 10.1121/1.3001672.

Cortical Representations of Speech in a Multitalker Auditory Scene.多说话者听觉场景中语音的皮质表征

J Neurosci. 2017 Sep 20;37(38):9189-9196. doi: 10.1523/JNEUROSCI.0938-17.2017. Epub 2017 Aug 18.

ARTSTREAM: a neural network model of auditory scene analysis and source segregation.ARTSTREAM：一种用于听觉场景分析和声源分离的神经网络模型。

Neural Netw. 2004 May;17(4):511-36. doi: 10.1016/j.neunet.2003.10.002.

Temporal coherence and the streaming of complex sounds.时间相干性与复杂声音的流动。

Adv Exp Med Biol. 2013;787:535-43. doi: 10.1007/978-1-4614-1590-9_59.

Pitch, harmonicity and concurrent sound segregation: psychoacoustical and neurophysiological findings.音高、和谐度和同时声音分离：心理声学和神经生理学研究发现。

Hear Res. 2010 Jul;266(1-2):36-51. doi: 10.1016/j.heares.2009.09.012. Epub 2009 Sep 27.

Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization.使用皮质带图的说话者归一化：一种用于稳态元音分类的神经模型。

J Acoust Soc Am. 2008 Dec;124(6):3918-36. doi: 10.1121/1.2997478.

Language experience-dependent advantage in pitch representation in the auditory cortex is limited to favorable signal-to-noise ratios.听觉皮层中音调表征的语言经验依赖性优势仅限于良好的信噪比。

Hear Res. 2017 Nov;355:42-53. doi: 10.1016/j.heares.2017.09.006. Epub 2017 Sep 14.

Extensive Tonotopic Mapping across Auditory Cortex Is Recapitulated by Spectrally Directed Attention and Systematically Related to Cortical Myeloarchitecture.通过频谱定向注意力可重现听觉皮层广泛的音调定位图谱，且该图谱与皮层髓鞘结构系统相关。

J Neurosci. 2017 Dec 13;37(50):12187-12201. doi: 10.1523/JNEUROSCI.1436-17.2017. Epub 2017 Nov 6.

Functional imaging of auditory scene analysis.听觉场景分析的功能成像。

Hear Res. 2014 Jan;307:98-110. doi: 10.1016/j.heares.2013.08.003. Epub 2013 Aug 19.

Cortical tracking of multiple streams outside the focus of attention in naturalistic auditory scenes.在自然听觉场景中，注意焦点外的多个流的皮层跟踪。

Neuroimage. 2018 Nov 1;181:617-626. doi: 10.1016/j.neuroimage.2018.07.052. Epub 2018 Jul 24.

引用本文的文献

Perceptual clustering in auditory streaming.听觉流中的感知聚类

PLoS Comput Biol. 2025 Jul 11;21(7):e1013189. doi: 10.1371/journal.pcbi.1013189. eCollection 2025 Jul.

Preliminary Evidence for Global Properties in Human Listeners During Natural Auditory Scene Perception.人类听众在自然听觉场景感知过程中全局特性的初步证据。

Open Mind (Camb). 2024 Mar 26;8:333-365. doi: 10.1162/opmi_a_00131. eCollection 2024.

Increased reliance on temporal coding when target sound is softer than the background.当目标声音比背景声音更柔和时，对时间编码的依赖增加。

Sci Rep. 2024 Feb 23;14(1):4457. doi: 10.1038/s41598-024-54865-5.

Effect of Reverberation on Neural Responses to Natural Speech in Rabbit Auditory Midbrain: No Evidence for a Neural Dereverberation Mechanism.混响对兔听觉中脑对自然语音神经反应的影响：无神经去混响机制的证据。

eNeuro. 2023 May 10;10(5). doi: 10.1523/ENEURO.0447-22.2023. Print 2023 May.

Modeling the Repetition-Based Recovering of Acoustic and Visual Sources With Dendritic Neurons.用树突状神经元对基于重复的听觉和视觉源恢复进行建模。

Front Neurosci. 2022 Apr 28;16:855753. doi: 10.3389/fnins.2022.855753. eCollection 2022.

Making sense of periodicity glimpses in a prediction-update-loop-A computational model of attentive voice tracking.理解预测更新循环中的周期性瞥见——一种注意力语音跟踪的计算模型

J Acoust Soc Am. 2022 Feb;151(2):712. doi: 10.1121/10.0009337.

Cortical Processing of Binaural Cues as Shown by EEG Responses to Random-Chord Stereograms.随机和音立体图的脑电响应显示的双耳线索的皮层加工

J Assoc Res Otolaryngol. 2022 Feb;23(1):75-94. doi: 10.1007/s10162-021-00820-4. Epub 2021 Dec 13.

Paradoxical relationship between speed and accuracy in olfactory figure-background segregation.嗅觉图形-背景分离中的速度与准确性之间的矛盾关系。

PLoS Comput Biol. 2021 Dec 6;17(12):e1009674. doi: 10.1371/journal.pcbi.1009674. eCollection 2021 Dec.

Object-based attention in complex, naturalistic auditory streams.基于对象的注意力在复杂的、自然主义的听觉流中。

Sci Rep. 2019 Feb 27;9(1):2854. doi: 10.1038/s41598-019-39166-6.

A Gestalt inference model for auditory scene segregation.听觉场景分离的格式塔推理模型。

PLoS Comput Biol. 2019 Jan 22;15(1):e1006711. doi: 10.1371/journal.pcbi.1006711. eCollection 2019 Jan.

本文引用的文献

Learning Invariance from Transformation Sequences.从变换序列中学习不变性。

Neural Comput. 1991 Summer;3(2):194-200. doi: 10.1162/neco.1991.3.2.194.

Separation of speech from interfering sounds based on oscillatory correlation.基于振荡相关性将语音与干扰声音分离。

IEEE Trans Neural Netw. 1999;10(3):684-97. doi: 10.1109/72.761727.

Auditory cortical receptive fields: stable entities with plastic abilities.听觉皮层感受野：具有可塑性的稳定实体。

J Neurosci. 2007 Sep 26;27(39):10372-82. doi: 10.1523/JNEUROSCI.1462-07.2007.

The role of attention in the formation of auditory streams.注意在听觉流形成中的作用。

Percept Psychophys. 2007 Jan;69(1):136-52. doi: 10.3758/bf03194460.

Does attention play a role in dynamic receptive field adaptation to changing acoustic salience in A1?注意力在初级听觉皮层（A1）中动态感受野适应不断变化的声学显著性方面是否发挥作用？

Hear Res. 2007 Jul;229(1-2):186-203. doi: 10.1016/j.heares.2007.01.009. Epub 2007 Jan 16.

Spectral modulation detection as a function of modulation frequency, carrier bandwidth, and carrier frequency region.作为调制频率、载波带宽和载波频率区域函数的光谱调制检测。

J Acoust Soc Am. 2007 Jan;121(1):363-72. doi: 10.1121/1.2382347.

The perceptual consequences of binaural hearing.双耳听觉的感知结果。

Int J Audiol. 2006;45 Suppl 1:S34-44. doi: 10.1080/14992020600782642.

A state-space analysis for reconstruction of goal-directed movements using neural signals.一种使用神经信号重建目标导向运动的状态空间分析。

Neural Comput. 2006 Oct;18(10):2465-94. doi: 10.1162/neco.2006.18.10.2465.

Neural encoding and retrieval of sound sequences.声音序列的神经编码与检索

Ann N Y Acad Sci. 2005 Dec;1060:125-35. doi: 10.1196/annals.1360.009.

Spectral processing in the auditory cortex.听觉皮层中的频谱处理。

Int Rev Neurobiol. 2005;70:253-98. doi: 10.1016/S0074-7742(05)70008-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验