Griffith Ian M, Hess R Preston, McDermott Josh H
Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.
bioRxiv. 2025 May 28:2025.05.28.656682. doi: 10.1101/2025.05.28.656682.
Attention facilitates communication by enabling selective listening to sound sources of interest. However, little is known about why attentional selection succeeds in some conditions but fails in others. While neurophysiology implicates multiplicative feature gains in selective attention, it is unclear whether such gains can explain real-world attention-driven behavior. To investigate these issues, we optimized an artificial neural network with stimulus-computable, feature-based gains to recognize a cued talker's speech from binaural audio in "cocktail party" scenarios. Though not trained to mimic humans, the model matched human performance across diverse real-world conditions, exhibiting selection based both on voice qualities and spatial location. It also predicted novel attentional effects that we confirmed in human experiments, and exhibited signatures of "late selection" like those seen in human auditory cortex. The results suggest that human-like attentional strategies naturally arise from optimization of feature gains for selective listening, offering a normative account of the mechanisms-and limitations-of auditory attention.
注意力通过使人能够选择性地倾听感兴趣的声源来促进交流。然而,对于为什么注意力选择在某些情况下成功而在其他情况下失败,我们知之甚少。虽然神经生理学表明在选择性注意中存在乘法特征增益,但尚不清楚这种增益是否能解释现实世界中由注意力驱动的行为。为了研究这些问题,我们优化了一个具有基于刺激可计算的特征增益的人工神经网络,以在“鸡尾酒会”场景中从双耳音频中识别出被提示说话者的语音。尽管该模型并非经过训练来模仿人类,但它在各种现实世界条件下与人类表现相匹配,表现出基于语音质量和空间位置的选择。它还预测了我们在人类实验中证实的新的注意力效应,并展现出与人类听觉皮层中所见类似的“晚期选择”特征。结果表明,类似人类的注意力策略自然地源于为选择性倾听而对特征增益进行的优化,为听觉注意力的机制及局限性提供了一种规范性解释。