Christiansen Morten H, Onnis Luca, Hockema Stephen A
Department of Psychology, Cornell University, Ithaca, NY 14853, USA.
Dev Sci. 2009 Apr;12(3):388-95. doi: 10.1111/j.1467-7687.2009.00824.x.
When learning language, young children are faced with many seemingly formidable challenges, including discovering words embedded in a continuous stream of sounds and determining what role these words play in syntactic constructions. We suggest that knowledge of phoneme distributions may play a crucial part in helping children segment words and determine their lexical category, and we propose an integrated model of how children might go from unsegmented speech to lexical categories. We corroborated this theoretical model using a two-stage computational analysis of a large corpus of English child-directed speech. First, we used transition probabilities between phonemes to find words in unsegmented speech. Second, we used distributional information about word edges--the beginning and ending phonemes of words--to predict whether the segmented words from the first stage were nouns, verbs, or something else. The results indicate that discovering lexical units and their associated syntactic category in child-directed speech is possible by attending to the statistics of single phoneme transitions and word-initial and final phonemes. Thus, we suggest that a core computational principle in language acquisition is that the same source of information is used to learn about different aspects of linguistic structure.
在学习语言时,幼儿面临着许多看似艰巨的挑战,包括从连续的语音流中发现单词,以及确定这些单词在句法结构中所起的作用。我们认为,音素分布的知识可能在帮助儿童分割单词并确定其词汇类别方面发挥关键作用,并且我们提出了一个关于儿童如何从未分割的语音过渡到词汇类别的综合模型。我们通过对大量针对儿童的英语语音语料库进行两阶段计算分析,证实了这一理论模型。首先,我们利用音素之间的转移概率在未分割的语音中找到单词。其次,我们利用关于单词边缘(单词的起始和结尾音素)的分布信息来预测第一阶段分割出的单词是名词、动词还是其他词类。结果表明,通过关注单个音素过渡以及单词起始和结尾音素的统计信息,在针对儿童的语音中发现词汇单元及其相关的句法类别是可能的。因此,我们认为语言习得中的一个核心计算原则是,相同的信息源被用于学习语言结构的不同方面。