Electrical and Computer Engineering Department, University of Maryland, College Park, MD, USA.
Electrical and Computer Engineering Department, The Johns Hopkins University, Baltimore, MD, USA.
Commun Biol. 2024 Oct 25;7(1):1392. doi: 10.1038/s42003-024-07096-3.
Perceptual segregation of complex sounds such as speech and music simultaneously emanating from multiple sources is a remarkable ability that is common in humans and other animals alike. Unlike animal physiological experiments with simplified sounds or human investigations with spatially broad imaging techniques, this study combines insights from animal single-unit recordings with segregation of speech-like sound mixtures. Ferrets are trained to attend to a female voice and detect a target word, both in presence and absence of a concurrent equally salient male voice. Recordings are made in primary and secondary auditory cortical fields, and in frontal cortex. During task performance, representation of the female words becomes enhanced relative to the male in all, but especially in higher cortical regions. Analysis of the temporal and spectral response characteristics during task performance reveals how speech segregation gradually emerges in the auditory cortex. A computational model evaluated on the same voice mixtures replicates and extends these results to different attentional targets (attention to female or male voices). These findings underscore the role of the principle of temporal coherence whereby attention to a target voice binds together all neural responses coherently modulated with the target, thus ultimately forming and extracting a common auditory stream.
同时从多个来源发出的复杂声音(如言语和音乐)的感知分离是一种非凡的能力,人类和其他动物都具有这种能力。与动物的简化声音的生理实验或人类使用空间广泛的成像技术的研究不同,这项研究结合了动物单细胞记录的见解和类似言语的声音混合物的分离。雪貂被训练专注于女性的声音并检测目标单词,无论是在存在还是不存在同时同样突出的男性声音的情况下。记录是在初级和次级听觉皮层以及额叶皮层中进行的。在任务执行期间,相对于男性,女性单词的表示在所有区域(但尤其是在更高的皮层区域)中都得到了增强。对任务执行期间的时间和频谱响应特征的分析揭示了言语分离如何逐渐出现在听觉皮层中。对相同的语音混合物进行评估的计算模型复制并扩展了这些结果,以用于不同的注意力目标(关注女性或男性声音)。这些发现强调了时间相干性原则的作用,即对目标声音的注意力将所有与目标相干调制的神经反应绑定在一起,从而最终形成并提取共同的听觉流。