Piantadosi Steven T
Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA,
Psychon Bull Rev. 2014 Oct;21(5):1112-30. doi: 10.3758/s13423-014-0585-6.
The frequency distribution of words has been a key object of study in statistical linguistics for the past 70 years. This distribution approximately follows a simple mathematical form known as Zipf's law. This article first shows that human language has a highly complex, reliable structure in the frequency distribution over and above this classic law, although prior data visualization methods have obscured this fact. A number of empirical phenomena related to word frequencies are then reviewed. These facts are chosen to be informative about the mechanisms giving rise to Zipf's law and are then used to evaluate many of the theoretical explanations of Zipf's law in language. No prior account straightforwardly explains all the basic facts or is supported with independent evaluation of its underlying assumptions. To make progress at understanding why language obeys Zipf's law, studies must seek evidence beyond the law itself, testing assumptions and evaluating novel predictions with new, independent data.
在过去70年里,词汇的频率分布一直是统计语言学研究的关键对象。这种分布大致遵循一种称为齐普夫定律的简单数学形式。本文首先表明,人类语言在频率分布上具有高度复杂且可靠的结构,这超出了这条经典定律,尽管先前的数据可视化方法掩盖了这一事实。接着回顾了一些与词频相关的实证现象。选择这些事实是为了揭示产生齐普夫定律的机制,然后用于评估语言中齐普夫定律的许多理论解释。之前没有任何一种解释能直接解释所有基本事实,也没有得到对其基本假设的独立评估的支持。为了在理解语言为何遵循齐普夫定律方面取得进展,研究必须在该定律本身之外寻找证据,用新的独立数据检验假设并评估新的预测。