Liu F, Yamaguchi Y, Shimizu H
Faculty of Pharmaceutical Sciences, University of Tokyo, Japan.
Biol Cybern. 1994;71(2):105-14. doi: 10.1007/BF00197313.
We propose a new model for speaker-independent vowel recognition which uses the flexibility of the dynamic linking that results from the synchronization of oscillating neural units. The system consists of an input layer and three neural layers, which are referred to as the A-, B- and C-centers. The input signals are a time series of linear prediction (LPC) spectrum envelopes of auditory signals. At each time-window within the series, the A-center receives input signals and extracts local peaks of the spectrum envelope, i.e., formants, and encodes them into local groups of independent oscillations. Speaker-independent vowel characteristics are embedded as a connection matrix in the B-center according to statistical data of Japanese vowels. The associative interaction in the B-center and reciprocal interaction between the A- and B-centers selectively activate a vowel as a global synchronized pattern over two centers. The C-center evaluates the synchronized activities among the three formant regions to give the selective output of the category among the five Japanese vowels. Thus, a flexible ability of dynamical linking among features is achieved over the three centers. The capability in the present system was investigated for speaker-independent recognition of Japanese vowels. The system demonstrated a remarkable ability for the recognition of vowels very similar to that of human listeners, including misleading vowels. In addition, it showed stable recognition for unsteady input signals and robustness against background noise. The optimum condition of the frequency of oscillation is discussed in comparison with stimulus-dependent synchronizations observed in neurophysiological experiments of the cortex.
我们提出了一种用于非特定说话者元音识别的新模型,该模型利用了振荡神经单元同步所产生的动态链接的灵活性。该系统由一个输入层和三个神经层组成,这三个神经层分别被称为A中心、B中心和C中心。输入信号是听觉信号的线性预测(LPC)频谱包络的时间序列。在该序列内的每个时间窗口,A中心接收输入信号并提取频谱包络的局部峰值,即共振峰,并将它们编码为独立振荡的局部组。根据日语元音的统计数据,非特定说话者的元音特征作为连接矩阵嵌入到B中心。B中心的关联相互作用以及A中心和B中心之间的相互作用选择性地激活一个元音,使其在两个中心上作为全局同步模式。C中心评估三个共振峰区域之间的同步活动,以给出五个日语元音类别中的选择性输出。因此,在这三个中心上实现了特征之间灵活的动态链接能力。对本系统在非特定说话者日语元音识别方面的能力进行了研究。该系统表现出了与人类听众非常相似的识别元音的显著能力,包括容易混淆的元音。此外,它对不稳定的输入信号表现出稳定的识别能力,并且对背景噪声具有鲁棒性。与在皮层神经生理学实验中观察到的刺激依赖同步相比,讨论了振荡频率的最佳条件。