Elman J L, Zipser D
Department of Linguistics, University of California, San Diego, La Jolla 92093.
J Acoust Soc Am. 1988 Apr;83(4):1615-26. doi: 10.1121/1.395916.
In the work described here, the backpropagation neural network learning procedure is applied to the analysis and recognition of speech. This procedure takes a set of input/output pattern pairs and attempts to learn their functional relationship; it develops the necessary representational features during the course of learning. A series of computer simulation studies was carried out to assess the ability of these networks to accurately label sounds, to learn to recognize sounds without labels, and to learn feature representations of continuous speech. These studies demonstrated that the networks can learn to label presegmented test tokens with accuracies of up to 95%. Networks trained on segmented sounds using a strategy that requires no external labels were able to recognize and delineate sounds in continuous speech. These networks developed rich internal representations that included units which corresponded to such traditional distinctions as vowels and consonants, as well as units that were sensitive to novel and nonstandard features. Networks trained on a large corpus of unsegmented, continuous speech without labels also developed interesting feature representations, which may be useful in both segmentation and label learning. The results of these studies, while preliminary, demonstrate that backpropagation learning can be used with complex, natural data to identify a feature structure that can serve as the basis for both analysis and nontrivial pattern recognition.
在本文所述的工作中,反向传播神经网络学习过程被应用于语音的分析和识别。该过程采用一组输入/输出模式对,并尝试学习它们的函数关系;它在学习过程中开发必要的表征特征。进行了一系列计算机模拟研究,以评估这些网络准确标记声音、学习识别无标记声音以及学习连续语音特征表征的能力。这些研究表明,网络能够学习以高达95%的准确率标记预分割的测试令牌。使用无需外部标记的策略在分割声音上训练的网络能够识别和区分连续语音中的声音。这些网络开发了丰富的内部表征,其中包括与元音和辅音等传统区别相对应的单元,以及对新颖和非标准特征敏感的单元。在大量无标记的连续语音语料库上训练的网络也开发了有趣的特征表征,这在分割和标记学习中可能都有用。这些研究结果虽然是初步的,但表明反向传播学习可用于复杂的自然数据,以识别可作为分析和非平凡模式识别基础的特征结构。