School of Biological Sciences, University of Reading, Reading, UK.
Philos Trans R Soc Lond B Biol Sci. 2011 Apr 12;366(1567):1101-7. doi: 10.1098/rstb.2010.0315.
We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from linguistic corpora that record frequencies of use for the 200 meanings in the widely used Swadesh fundamental vocabulary. Our interest is to assess evidence for shared patterns of language use around the world, and for the relationship of language use to rates of lexical replacement, defined as the replacement of a word by a new unrelated or non-cognate word. Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p < 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use. Elsewhere, we have shown that frequently used words in the Indo-European languages tend to be more conserved, and that this relationship holds separately for different parts of speech. A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change. Evidence for this speculation comes from using the same factor loadings and part-of-speech categories to predict a word's position in a list of 110 words ranked from slowest to most rapidly evolving among 14 of the world's language families. This regression model accounts for 30 per cent of the variance. Our results point to a remarkable regularity in the way that human speakers use language, and hint that the words for a shared set of meanings have been slowly evolving and others more rapidly evolving throughout human history.
我们呈现了来自 17 种语言的数据,这些语言在日常语言中使用一组常见词汇的频率。这些语言来自代表世界上 7000 种语言的 65%的六种语言家族。我们的数据来自记录 Swadesh 基本词汇中 200 个含义使用频率的语言语料库。我们的兴趣是评估世界各地语言使用模式的共享证据,以及语言使用与词汇替换率之间的关系,词汇替换率定义为用一个新词替换一个旧词,新词与旧词没有关联或非同源。Swadesh 词汇表中的单词使用频率从每百万个单词中只有几个到 191000 个或更多不等。在 200 个单词的使用频率方面,语言之间的平均相互相关性为 0.73(p<0.0001)。这些数据的第一主成分解释了使用频率方差的 70%。在其他地方,我们已经表明,印欧语言中经常使用的单词往往更保守,而且这种关系在不同的词性中单独成立。一个将来自全球样本的主要因子负荷与它们的词性相结合的回归模型,预测了印欧语言中词汇替换率的 46%的方差。这表明印欧词汇替换率可能广泛代表全球变化率。这种推测的证据来自于使用相同的因子负荷和词性类别来预测 110 个单词列表中一个单词的位置,该列表是根据 14 种世界语言家族中最慢和最快进化的单词排名的。这个回归模型解释了 30%的方差。我们的结果指向人类说话者使用语言的一种显著规律性,并暗示共享词义的单词在人类历史上一直在缓慢进化,而其他单词则在快速进化。