Sibley Daragh E, Kello Christopher T, Plaut David C, Elman Jeffrey L
Department of Psychology, George Mason University.
Cogn Sci. 2008 Jun 1;32(4):741-754. doi: 10.1080/03640210802066964.
The forms of words as they appear in text and speech are central to theories and models of lexical processing. Nonetheless, current methods for simulating their learning and representation fail to approach the scale and heterogeneity of real wordform lexicons. A connectionist architecture termed the sequence encoder is used to learn nearly 75,000 wordform representations through exposure to strings of stress-marked phonemes or letters. First, the mechanisms and efficacy of the sequence encoder are demonstrated and shown to overcome problems with traditional slot-based codes. Then, two large-scale simulations are reported that learned to represent lexicons of either phonological or orthographic word-forms. In doing so, the models learned the statistics of their lexicons as shown by better processing of well-formed pseudowords as opposed to ill-formed (scrambled) pseudowords, and by accounting for variance in well-formedness ratings. It is discussed how the sequence encoder may be integrated into broader models of lexical processing.
文本和语音中出现的单词形式是词汇处理理论和模型的核心。然而,目前用于模拟单词形式学习和表示的方法未能达到真实单词形式词典的规模和异质性。一种称为序列编码器的联结主义架构通过接触带有重音标记的音素或字母串来学习近75000个单词形式表示。首先,展示了序列编码器的机制和功效,并表明它克服了传统基于插槽编码的问题。然后,报告了两个大规模模拟,它们学习表示语音或正字法单词形式的词典。在此过程中,模型学习了其词典的统计信息,这表现为对格式正确的伪词的处理优于格式不正确(打乱)的伪词,并考虑了格式正确程度评级的差异。文中讨论了序列编码器如何整合到更广泛的词汇处理模型中。