再探鸡尾酒会问题：多说话者语音的早期处理与选择

The cocktail-party problem revisited: early processing and selection of multi-talker speech.

作者信息

Bronkhorst Adelbert W

机构信息

TNO Human Factors, POB 23, 3769 ZG, Soesterberg, The Netherlands,

出版信息

Atten Percept Psychophys. 2015 Jul;77(5):1465-87. doi: 10.3758/s13414-015-0882-9.

DOI:10.3758/s13414-015-0882-9

PMID:25828463

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4469089/

Abstract

How do we recognize what one person is saying when others are speaking at the same time? This review summarizes widespread research in psychoacoustics, auditory scene analysis, and attention, all dealing with early processing and selection of speech, which has been stimulated by this question. Important effects occurring at the peripheral and brainstem levels are mutual masking of sounds and "unmasking" resulting from binaural listening. Psychoacoustic models have been developed that can predict these effects accurately, albeit using computational approaches rather than approximations of neural processing. Grouping—the segregation and streaming of sounds—represents a subsequent processing stage that interacts closely with attention. Sounds can be easily grouped—and subsequently selected—using primitive features such as spatial location and fundamental frequency. More complex processing is required when lexical, syntactic, or semantic information is used. Whereas it is now clear that such processing can take place preattentively, there also is evidence that the processing depth depends on the task-relevancy of the sound. This is consistent with the presence of a feedback loop in attentional control, triggering enhancement of to-be-selected input. Despite recent progress, there are still many unresolved issues: there is a need for integrative models that are neurophysiologically plausible, for research into grouping based on other than spatial or voice-related cues, for studies explicitly addressing endogenous and exogenous attention, for an explanation of the remarkable sluggishness of attention focused on dynamically changing sounds, and for research elucidating the distinction between binaural speech perception and sound localization.

摘要

当其他人同时说话时，我们如何识别其中一个人在说什么？这篇综述总结了心理声学、听觉场景分析和注意力方面的广泛研究，所有这些研究都涉及语音的早期处理和选择，而这个问题激发了这些研究。在周围和脑干水平出现的重要效应是声音的相互掩蔽以及双耳聆听产生的“解掩蔽”。已经开发出了心理声学模型，这些模型能够准确预测这些效应，尽管使用的是计算方法而非神经处理的近似值。分组——声音的分离和流——代表了一个随后的处理阶段，它与注意力密切相互作用。利用诸如空间位置和基频等原始特征，声音可以很容易地被分组——并随后被选择。当使用词汇、句法或语义信息时，则需要更复杂的处理。虽然现在很清楚这种处理可以在注意力前进行，但也有证据表明处理深度取决于声音与任务的相关性。这与注意力控制中存在反馈回路一致，该回路触发对要选择的输入的增强。尽管最近取得了进展，但仍有许多未解决的问题：需要有神经生理学上合理的综合模型，需要研究基于空间或语音相关线索以外的线索进行的分组，需要明确研究内源性和外源性注意力，需要解释专注于动态变化声音的注意力明显迟缓的现象，以及需要研究阐明双耳语音感知和声音定位之间的区别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9588/4469089/67a23770f6e6/13414_2015_882_Fig1_HTML.jpg

相似文献

The cocktail-party problem revisited: early processing and selection of multi-talker speech.再探鸡尾酒会问题：多说话者语音的早期处理与选择

Atten Percept Psychophys. 2015 Jul;77(5):1465-87. doi: 10.3758/s13414-015-0882-9.

Peripheral hearing loss reduces the ability of children to direct selective attention during multi-talker listening.外周性听力损失会降低儿童在多说话者倾听过程中定向选择性注意力的能力。

Hear Res. 2017 Jul;350:160-172. doi: 10.1016/j.heares.2017.05.005. Epub 2017 May 10.

Visually guided auditory attention in a dynamic "cocktail-party" speech perception task: ERP evidence for age-related differences.动态“鸡尾酒会”言语感知任务中的视觉引导听觉注意：与年龄相关差异的事件相关电位证据

Hear Res. 2017 Feb;344:98-108. doi: 10.1016/j.heares.2016.11.001. Epub 2016 Nov 5.

The role of reverberation-related binaural cues in the externalization of speech.与混响相关的双耳线索在言语空间外部化中的作用。

J Acoust Soc Am. 2015 Aug;138(2):1154-67. doi: 10.1121/1.4928132.

Cocktail party listening in a dynamic multitalker environment.在动态多说话者环境中的鸡尾酒会式听力。

Percept Psychophys. 2007 Jan;69(1):79-91. doi: 10.3758/bf03194455.

Inharmonic speech reveals the role of harmonicity in the cocktail party problem.不和谐的语音揭示了和谐性在鸡尾酒会问题中的作用。

Nat Commun. 2018 May 29;9(1):2122. doi: 10.1038/s41467-018-04551-8.

Can basic auditory and cognitive measures predict hearing-impaired listeners' localization and spatial speech recognition abilities?基本的听觉和认知测量能否预测听力受损者的定位和空间言语识别能力？

J Acoust Soc Am. 2011 Sep;130(3):1542-58. doi: 10.1121/1.3608122.

Speech intelligibility among modulated and spatially distributed noise sources.调制和空间分布噪声源之间的语音可懂度。

J Acoust Soc Am. 2013 Apr;133(4):2254-61. doi: 10.1121/1.4794384.

Speech segregation based on sound localization.基于声音定位的语音分离。

J Acoust Soc Am. 2003 Oct;114(4 Pt 1):2236-52. doi: 10.1121/1.1610463.

Spatial Release From Masking in Simulated Cochlear Implant Users With and Without Access to Low-Frequency Acoustic Hearing.有和没有低频听觉的模拟人工耳蜗使用者的掩蔽空间释放

Trends Hear. 2015 Dec 30;19:2331216515616940. doi: 10.1177/2331216515616940.

引用本文的文献

The role of harmonicity on listeners' ability to hear out voices in polyphonic music.和声在听众从复调音乐中分辨出各个声部能力方面所起的作用。

Sci Rep. 2025 Aug 28;15(1):31686. doi: 10.1038/s41598-025-16404-8.

Evaluation of Speaker-Conditioned Target Speaker Extraction Algorithms for Hearing-Impaired Listeners.针对听力受损听众的说话者条件目标说话者提取算法评估

Trends Hear. 2025 Jan-Dec;29:23312165251365802. doi: 10.1177/23312165251365802. Epub 2025 Aug 11.

Initial Evaluation of a New Auditory Attention Task for Assessing Alerting, Orienting, and Executive Control Attention.用于评估警觉、定向和执行控制注意力的新型听觉注意力任务的初步评估

J Speech Lang Hear Res. 2025 Aug 12;68(8):4049-4060. doi: 10.1044/2025_JSLHR-24-00513. Epub 2025 Jul 17.

Reduced Neural Speech Tracking in Adolescents with Listening Difficulty.听力困难青少年的神经语音跟踪能力下降。

medRxiv. 2025 Jun 24:2025.06.24.25330187. doi: 10.1101/2025.06.24.25330187.

Speaker-story mapping as a method to evaluate audiovisual scene analysis in a virtual classroom scenario.将讲述者-故事映射作为一种在虚拟课堂场景中评估视听场景分析的方法。

Front Psychol. 2025 Jun 10;16:1520630. doi: 10.3389/fpsyg.2025.1520630. eCollection 2025.

Fundamental dimensions of real-time word recognition in challenging listening conditions exhibit within-subject stability and link to outcomes.在具有挑战性的听力条件下，实时单词识别的基本维度表现出受试者内部的稳定性，并与结果相关联。

J Exp Psychol Gen. 2025 Jun 23. doi: 10.1037/xge0001788.

Auditory Learning and Generalization in Older Adults: Evidence from Voice Discrimination Training.老年人的听觉学习与泛化：来自语音辨别训练的证据。

Trends Hear. 2025 Jan-Dec;29:23312165251342436. doi: 10.1177/23312165251342436. Epub 2025 May 27.

Am J Audiol. 2025 Jun 3;34(2):388-399. doi: 10.1044/2025_AJA-24-00253. Epub 2025 May 9.

Comparing MEG and EEG measurement set-ups for a brain-computer interface based on selective auditory attention.基于选择性听觉注意的脑机接口中MEG与EEG测量设置的比较

PLoS One. 2025 Apr 10;20(4):e0319328. doi: 10.1371/journal.pone.0319328. eCollection 2025.

The Relationship Between Spatial Release From Masking and Listening Effort Among Cochlear Implant Users With Single-Sided Deafness.单侧耳聋人工耳蜗使用者的掩蔽空间释放与聆听努力之间的关系

Ear Hear. 2025;46(3):624-639. doi: 10.1097/AUD.0000000000001611. Epub 2025 Feb 19.

本文引用的文献

Categorical perception.范畴知觉。

Wiley Interdiscip Rev Cogn Sci. 2010 Jan;1(1):69-78. doi: 10.1002/wcs.26. Epub 2009 Dec 23.

The cognitive determinants of behavioral distraction by deviant auditory stimuli: a review.异常听觉刺激导致行为分心的认知决定因素：综述

Psychol Res. 2014;78(3):321-38. doi: 10.1007/s00426-013-0534-4. Epub 2013 Dec 21.

Development and evaluation of a linguistically and audiologically controlled sentence intelligibility test.言语可懂度测试的语言学和听力学控制的开发和评估。

J Acoust Soc Am. 2013 Oct;134(4):3039-56. doi: 10.1121/1.4818760.

Swinging at a cocktail party: voice familiarity aids speech perception in the presence of a competing voice.在鸡尾酒会上摇摆：在存在竞争声音的情况下，语音熟悉度有助于语音感知。

Psychol Sci. 2013 Oct;24(10):1995-2004. doi: 10.1177/0956797613482467. Epub 2013 Aug 28.

The information-divergence hypothesis of informational masking.信息掩蔽的信息散度假说。

J Acoust Soc Am. 2013 Sep;134(3):2160-70. doi: 10.1121/1.4817875.

A multi-resolution envelope-power based model for speech intelligibility.基于多分辨率包络功率的语音可懂度模型。

J Acoust Soc Am. 2013 Jul;134(1):436-46. doi: 10.1121/1.4807563.

Representation of speech in human auditory cortex: is it special?人类听觉皮层中的言语表征：它具有特殊性吗？

Hear Res. 2013 Nov;305:57-73. doi: 10.1016/j.heares.2013.05.013. Epub 2013 Jun 18.

Mechanisms underlying selective neuronal tracking of attended speech at a "cocktail party".选择性跟踪鸡尾酒会上注意到的演讲的神经元的机制。

Neuron. 2013 Mar 6;77(5):980-91. doi: 10.1016/j.neuron.2012.12.037.

A dual contribution to the involuntary semantic processing of unexpected spoken words.对意外口语词的无意识语义加工的双重贡献。

J Exp Psychol Gen. 2014 Feb;143(1):38-45. doi: 10.1037/a0031550. Epub 2013 Jan 21.

Involuntary attentional capture by speech and non-speech deviations: a combined behavioral-event-related potential study.言语和非言语偏差引起的非自愿注意捕获：一项结合行为事件相关电位的研究。

Brain Res. 2013 Jan 15;1490:153-60. doi: 10.1016/j.brainres.2012.10.055. Epub 2012 Nov 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

再探鸡尾酒会问题：多说话者语音的早期处理与选择

The cocktail-party problem revisited: early processing and selection of multi-talker speech.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献