Batchelder Eleanor Olds
The Graduate Center of the City University of New York, New York, USA.
Cognition. 2002 Mar;83(2):167-206. doi: 10.1016/s0010-0277(02)00002-1.
Prelinguistic infants must find a way to isolate meaningful chunks from the continuous streams of speech that they hear. BootLex, a new model which uses distributional cues to build a lexicon, demonstrates how much can be accomplished using this single source of information. This conceptually simple probabilistic algorithm achieves significant segmentation results on various kinds of language corpora - English, Japanese, and Spanish; child- and adult-directed speech, and written texts; and several variations in coding structure - and reveals which statistical characteristics of the input have an influence on segmentation performance. BootLex is then compared, quantitatively and qualitatively, with three other groups of computational models of the same infant segmentation process, paying particular attention to functional characteristics of the models and their similarity to human cognition. Commonalities and contrasts among the models are discussed, as well as their implications both for theories of the cognitive problem of segmentation itself, and for the general enterprise of computational cognitive modeling.
前语言期的婴儿必须找到一种方法,从他们听到的连续语音流中分离出有意义的片段。BootLex是一种使用分布线索来构建词汇表的新模型,它展示了仅使用这一单一信息源能够实现多少目标。这个概念上简单的概率算法在各种语言语料库上都取得了显著的切分结果——英语、日语和西班牙语;儿童导向和成人导向的语音以及书面文本;以及编码结构的几种变体——并揭示了输入的哪些统计特征会对切分性能产生影响。然后,将BootLex与其他三组针对相同婴儿切分过程的计算模型进行定量和定性比较,特别关注模型的功能特征及其与人类认知的相似性。讨论了模型之间的共性和差异,以及它们对切分本身的认知问题理论和计算认知建模的一般事业的影响。