Wang D L, Brown G J
Department of Computer and Information Science and Center for Cognitive Science, The Ohio State University, Columbus, OH 43210-1277, USA.
IEEE Trans Neural Netw. 1999;10(3):684-97. doi: 10.1109/72.761727.
A multistage neural model is proposed for an auditory scene analysis task--segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized relaxation oscillators, each of which corresponds to an auditory feature, and different streams are represented by desynchronized oscillator populations. Lateral connections between oscillators encode harmonicity, and proximity in frequency and time. Prior to the oscillator network are a model of the auditory periphery and a stage in which mid-level auditory representations are formed. The model has been systematically evaluated using a corpus of voiced speech mixed with interfering sounds, and produces improvements in terms of signal-to-noise ratio for every mixture. The performance of our model is compared with other studies on computational auditory scene analysis. A number of issues including biological plausibility and real-time implementation are also discussed.
我们提出了一种用于听觉场景分析任务的多级神经模型,该任务是将语音与干扰声源分离。该模型的核心是一个两层振荡器网络,它基于振荡相关性执行流分离。在振荡相关性框架中,一个流由一群同步的弛豫振荡器表示,每个振荡器对应一个听觉特征,不同的流由不同步的振荡器群体表示。振荡器之间的横向连接编码谐波性以及频率和时间上的接近度。在振荡器网络之前是听觉外周模型和形成中级听觉表征的阶段。我们使用混合了干扰声音的浊音语音语料库对该模型进行了系统评估,并且对于每种混合情况,该模型在信噪比方面都有改善。我们将模型的性能与其他计算听觉场景分析研究进行了比较。还讨论了包括生物学合理性和实时实现在内的一些问题。