Laboratoire de Sciences Cognitives et Psycholinguistique (EHESS-ENS-CNRS) Laboratory for Language Development, RIKEN Brain Science Institute, Saitama, Japan.
Cogn Sci. 2013 Jan-Feb;37(1):103-24. doi: 10.1111/j.1551-6709.2012.01267.x. Epub 2012 Sep 17.
Before the end of the first year of life, infants begin to lose the ability to perceive distinctions between sounds that are not phonemic in their native language. It is typically assumed that this developmental change reflects the construction of language-specific phoneme categories, but how these categories are learned largely remains a mystery. Peperkamp, Le Calvez, Nadal, and Dupoux (2006) present an algorithm that can discover phonemes using the distributions of allophones as well as the phonetic properties of the allophones and their contexts. We show that a third type of information source, the occurrence of pairs of minimally differing word forms in speech heard by the infant, is also useful for learning phonemic categories and is in fact more reliable than purely distributional information in data containing a large number of allophones. In our model, learners build an approximation of the lexicon consisting of the high-frequency n-grams present in their speech input, allowing them to take advantage of top-down lexical information without needing to learn words. This may explain how infants have already begun to exhibit sensitivity to phonemic categories before they have a large receptive lexicon.
在生命的第一年结束之前,婴儿开始失去感知母语中非音位区别的能力。人们普遍认为,这种发展变化反映了语言特定音位类别的构建,但这些类别是如何习得的在很大程度上仍然是个谜。Peperkamp、Le Calvez、Nadal 和 Dupoux(2006 年)提出了一种算法,该算法可以使用变音位的分布以及变音位及其上下文的语音特性来发现音位。我们表明,第三种信息源,即婴儿在听到的言语中出现的最小差异的词对,对于学习音位类别也是有用的,并且在包含大量变音位的数据中实际上比纯分布信息更可靠。在我们的模型中,学习者构建了一个由其语音输入中存在的高频 n-gram 组成的词汇近似值,使他们能够利用自上而下的词汇信息,而无需学习单词。这也许可以解释为什么婴儿在拥有大量接受性词汇之前,就已经开始表现出对音位类别的敏感性。