Suppr超能文献

再探鸡尾酒会问题:多说话者语音的早期处理与选择

The cocktail-party problem revisited: early processing and selection of multi-talker speech.

作者信息

Bronkhorst Adelbert W

机构信息

TNO Human Factors, POB 23, 3769 ZG, Soesterberg, The Netherlands,

出版信息

Atten Percept Psychophys. 2015 Jul;77(5):1465-87. doi: 10.3758/s13414-015-0882-9.

Abstract

How do we recognize what one person is saying when others are speaking at the same time? This review summarizes widespread research in psychoacoustics, auditory scene analysis, and attention, all dealing with early processing and selection of speech, which has been stimulated by this question. Important effects occurring at the peripheral and brainstem levels are mutual masking of sounds and "unmasking" resulting from binaural listening. Psychoacoustic models have been developed that can predict these effects accurately, albeit using computational approaches rather than approximations of neural processing. Grouping—the segregation and streaming of sounds—represents a subsequent processing stage that interacts closely with attention. Sounds can be easily grouped—and subsequently selected—using primitive features such as spatial location and fundamental frequency. More complex processing is required when lexical, syntactic, or semantic information is used. Whereas it is now clear that such processing can take place preattentively, there also is evidence that the processing depth depends on the task-relevancy of the sound. This is consistent with the presence of a feedback loop in attentional control, triggering enhancement of to-be-selected input. Despite recent progress, there are still many unresolved issues: there is a need for integrative models that are neurophysiologically plausible, for research into grouping based on other than spatial or voice-related cues, for studies explicitly addressing endogenous and exogenous attention, for an explanation of the remarkable sluggishness of attention focused on dynamically changing sounds, and for research elucidating the distinction between binaural speech perception and sound localization.

摘要

当其他人同时说话时,我们如何识别其中一个人在说什么?这篇综述总结了心理声学、听觉场景分析和注意力方面的广泛研究,所有这些研究都涉及语音的早期处理和选择,而这个问题激发了这些研究。在周围和脑干水平出现的重要效应是声音的相互掩蔽以及双耳聆听产生的“解掩蔽”。已经开发出了心理声学模型,这些模型能够准确预测这些效应,尽管使用的是计算方法而非神经处理的近似值。分组——声音的分离和流——代表了一个随后的处理阶段,它与注意力密切相互作用。利用诸如空间位置和基频等原始特征,声音可以很容易地被分组——并随后被选择。当使用词汇、句法或语义信息时,则需要更复杂的处理。虽然现在很清楚这种处理可以在注意力前进行,但也有证据表明处理深度取决于声音与任务的相关性。这与注意力控制中存在反馈回路一致,该回路触发对要选择的输入的增强。尽管最近取得了进展,但仍有许多未解决的问题:需要有神经生理学上合理的综合模型,需要研究基于空间或语音相关线索以外的线索进行的分组,需要明确研究内源性和外源性注意力,需要解释专注于动态变化声音的注意力明显迟缓的现象,以及需要研究阐明双耳语音感知和声音定位之间的区别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9588/4469089/67a23770f6e6/13414_2015_882_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验