Hambrook Dillon A, Ilievski Marko, Mosadeghzad Mohamad, Tata Matthew
Department of Neuroscience, University of Lethbridge, Lethbridge, Alberta, Canada.
PLoS One. 2017 Oct 5;12(10):e0186104. doi: 10.1371/journal.pone.0186104. eCollection 2017.
The process of resolving mixtures of several sounds into their separate individual streams is known as auditory scene analysis and it remains a challenging task for computational systems. It is well-known that animals use binaural differences in arrival time and intensity at the two ears to find the arrival angle of sounds in the azimuthal plane, and this localization function has sometimes been considered sufficient to enable the un-mixing of complex scenes. However, the ability of such systems to resolve distinct sound sources in both space and frequency remains limited. The neural computations for detecting interaural time difference (ITD) have been well studied and have served as the inspiration for computational auditory scene analysis systems, however a crucial limitation of ITD models is that they produce ambiguous or "phantom" images in the scene. This has been thought to limit their usefulness at frequencies above about 1khz in humans. We present a simple Bayesian model and an implementation on a robot that uses ITD information recursively. The model makes use of head rotations to show that ITD information is sufficient to unambiguously resolve sound sources in both space and frequency. Contrary to commonly held assumptions about sound localization, we show that the ITD cue used with high-frequency sound can provide accurate and unambiguous localization and resolution of competing sounds. Our findings suggest that an "active hearing" approach could be useful in robotic systems that operate in natural, noisy settings. We also suggest that neurophysiological models of sound localization in animals could benefit from revision to include the influence of top-down memory and sensorimotor integration across head rotations.
将几种声音的混合体分解为各自独立音流的过程被称为听觉场景分析,这对计算系统来说仍然是一项具有挑战性的任务。众所周知,动物利用两耳间声音到达时间和强度的差异来确定声音在方位平面上的到达角度,这种定位功能有时被认为足以实现复杂场景的解混。然而,此类系统在空间和频率上分辨不同声源的能力仍然有限。用于检测耳间时间差(ITD)的神经计算已经得到了充分研究,并为计算听觉场景分析系统提供了灵感,然而ITD模型的一个关键局限性在于它们会在场景中产生模糊或“幻影”图像。人们认为这限制了它们在人类约1千赫兹以上频率的有用性。我们提出了一个简单的贝叶斯模型以及在机器人上的一种实现方式,该机器人递归地使用ITD信息。该模型利用头部转动来表明ITD信息足以明确地在空间和频率上分辨声源。与关于声音定位的普遍假设相反,我们表明高频声音所使用的ITD线索能够提供准确且明确的定位以及对竞争声音的分辨。我们的研究结果表明,“主动听觉”方法在运行于自然嘈杂环境中的机器人系统中可能会很有用。我们还建议,动物声音定位的神经生理模型可能受益于修订,以纳入自上而下的记忆以及跨头部转动的感觉运动整合的影响。