Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA.
Adv Exp Med Biol. 2013;787:535-43. doi: 10.1007/978-1-4614-1590-9_59.
Humans and other animals can attend to one of multiple sounds, and -follow it selectively over time. The neural underpinnings of this perceptual feat remain mysterious. Some studies have concluded that sounds are heard as separate streams when they activate well-separated populations of central auditory neurons, and that this process is largely pre-attentive. Here, we propose instead that stream formation depends primarily on temporal coherence between responses that encode various features of a sound source. Furthermore, we postulate that only when attention is directed toward a particular feature (e.g., pitch or location) do all other temporally coherent features of that source (e.g., timbre and location) become bound together as a stream that is segregated from the incoherent features of other sources. Experimental -neurophysiological evidence in support of this hypothesis will be presented. The focus, however, will be on a computational realization of this idea and a discussion of the insights learned from simulations to disentangle complex sound sources such as speech and music. The model consists of a representational stage of early and cortical auditory processing that creates a multidimensional depiction of various sound attributes such as pitch, location, and spectral resolution. The following stage computes a coherence matrix that summarizes the pair-wise correlations between all channels making up the cortical representation. Finally, the perceived segregated streams are extracted by decomposing the coherence matrix into its uncorrelated components. Questions raised by the model are discussed, especially on the role of attention in streaming and the search for further neural correlates of streaming percepts.
人类和其他动物可以关注多个声音中的一个,并随着时间的推移选择性地跟踪它。这种感知能力的神经基础仍然是个谜。一些研究得出结论,当声音激活中枢听觉神经元的分离群体时,声音会被听到为独立的流,而这个过程在很大程度上是前注意的。相反,我们提出,流的形成主要取决于对声音源各种特征进行编码的反应之间的时间相干性。此外,我们假设,只有当注意力指向特定特征(例如音高或位置)时,该源的所有其他时间相干特征(例如音色和位置)才会作为一个流被绑定在一起,与其他源的不相关特征分离。我们将提出支持这一假设的实验神经生理学证据。然而,重点将放在这一想法的计算实现上,并讨论从模拟中获得的见解,以分离复杂的声源,如语音和音乐。该模型由早期和皮质听觉处理的表示阶段组成,该阶段创建了各种声音属性的多维描述,例如音高、位置和频谱分辨率。下一阶段计算一个相干矩阵,该矩阵总结了构成皮质表示的所有通道之间的成对相关性。最后,通过将相干矩阵分解为其不相关的分量来提取感知到的分离流。模型提出的问题将被讨论,特别是注意在流中的作用以及对流感知的进一步神经相关性的研究。