Department of Linguistics, Stanford University, United States.
Cognition. 2013 Jun;127(3):439-53. doi: 10.1016/j.cognition.2013.02.002. Epub 2013 Apr 2.
Word frequencies in natural language follow a highly skewed Zipfian distribution, but the consequences of this distribution for language acquisition are only beginning to be understood. Typically, learning experiments that are meant to simulate language acquisition use uniform word frequency distributions. We examine the effects of Zipfian distributions using two artificial language paradigms-a standard forced-choice task and a new orthographic segmentation task in which participants click on the boundaries between words in contexts. Our data show that learners can identify word forms robustly across widely varying frequency distributions. In addition, although performance in recognizing individual words is predicted best by their frequency, a Zipfian distribution facilitates word segmentation in context: the presence of high-frequency words creates more chances for learners to apply their knowledge in processing new sentences. We find that computational models that implement "chunking" are more effective than "transition finding" models at reproducing this pattern of performance.
自然语言中的词汇频率遵循高度倾斜的齐夫分布,但这种分布对语言习得的影响才刚刚开始被理解。通常,旨在模拟语言习得的学习实验使用均匀的词汇频率分布。我们使用两种人工语言范例——标准的强制选择任务和新的正字法分割任务,来检验齐夫分布的影响,在正字法分割任务中,参与者在上下文点击单词之间的边界。我们的数据表明,学习者可以在广泛变化的频率分布中稳健地识别单词形式。此外,尽管识别单个单词的性能最好由其频率预测,但齐夫分布在上下文中有助于分词:高频词的出现为学习者在处理新句子时应用知识创造了更多机会。我们发现,实现“组块”的计算模型比“转换发现”模型更有效地再现这种性能模式。