Elhilali Mounya, Shamma Shihab A
Department of Electrical and Computer Engineering, Johns Hopkins University, Barton, Baltimore, Maryland 21218, USA.
J Acoust Soc Am. 2008 Dec;124(6):3751-71. doi: 10.1121/1.3001672.
Sound systems and speech technologies can benefit greatly from a deeper understanding of how the auditory system, and particularly the auditory cortex, is able to parse complex acoustic scenes into meaningful auditory objects and streams under adverse conditions. In the current work, a biologically plausible model of this process is presented, where the role of cortical mechanisms in organizing complex auditory scenes is explored. The model consists of two stages: (i) a feature analysis stage that maps the acoustic input into a multidimensional cortical representation and (ii) an integrative stage that recursively builds up expectations of how streams evolve over time and reconciles its predictions with the incoming sensory input by sorting it into different clusters. This approach yields a robust computational scheme for speaker separation under conditions of speech or music interference. The model can also emulate the archetypal streaming percepts of tonal stimuli that have long been tested in human subjects. The implications of this model are discussed with respect to the physiological correlates of streaming in the cortex as well as the role of attention and other top-down influences in guiding sound organization.
声音系统和语音技术可以从更深入地理解听觉系统,特别是听觉皮层如何在不利条件下将复杂的声学场景解析为有意义的听觉对象和流中受益匪浅。在当前的工作中,提出了一个关于这个过程的生物学上合理的模型,其中探索了皮层机制在组织复杂听觉场景中的作用。该模型由两个阶段组成:(i)一个特征分析阶段,将声学输入映射到多维皮层表示中;(ii)一个整合阶段,递归地建立关于流如何随时间演变的期望,并通过将传入的感官输入分类到不同的簇中来使其预测与传入的感官输入相协调。这种方法产生了一种在语音或音乐干扰条件下进行说话者分离的强大计算方案。该模型还可以模拟长期以来在人类受试者中测试过的音调刺激的典型流感知。讨论了该模型在皮层中流的生理相关性以及注意力和其他自上而下的影响在引导声音组织中的作用方面的意义。