Suppr超能文献

文化组学作为选择测试的数据平台:检测词汇使用中选择的数学方法。

Culturomics as a data playground for tests of selection: Mathematical approaches to detecting selection in word use.

作者信息

Sindi Suzanne S, Dale Rick

机构信息

Applied Mathematics, University of California, Merced, USA.

Cognitive and Information Sciences, University of California, Merced, USA.

出版信息

J Theor Biol. 2016 Sep 21;405:140-9. doi: 10.1016/j.jtbi.2015.12.012. Epub 2016 Jan 21.

Abstract

In biological evolution traits may rise and fall in frequency due to genetic drift, where variant frequencies change by chance, or by selection where advantageous variants will rise in frequency. The neutral model of evolution, first developed by Kimura in the 1960s, has become the standard against which selection is detected. While the balance between these two important forces - drift and selection - has been well established in biology there are other domains where the contribution of these processes is still coming together. Although the idea of natural selection has been applied to the cultural domain since the time of Darwin, it has proven more challenging to positively identify cultural traits under selection both because of a lack of established tests for selection and a lack of large cultural data sets. However, in recent years with the accumulation of large cultural data sets many cultural features from pre-history pottery to modern baby names have been shown to evolve according to the neutral theory. But there is accumulating empirical evidence from cultural processes suggesting that the neutral theory alone cannot account for all features of the data. As such, there has been a renewed interest in determining whether there is selection amidst drift. Here we analyze a subset English word frequencies, and determine whether frequency change reveals processes of selection. Inspired by the Moran and Wright-Fisher models in population genetics, we developed a neutral model of word frequency variation to assess when linguistic data appears to depart from neutral evolution. As such, our model represents a possible "test for selection" in the linguistic domain. We explore how the distribution of word use has changed for sets of words in English for more than 100 years (1901-2008) as expressed in vocabulary usage in published books, made available by Google Ngram. When comparing empirical word frequency changes to our neutral model we find pervasive and systematic departures from neutrality.

摘要

在生物进化中,由于基因漂变(即变异频率随机变化)或选择(即有利变异的频率会上升),性状的频率可能会上升或下降。进化的中性模型由木村资生于20世纪60年代首次提出,已成为检测选择的标准。虽然这两种重要力量——漂变和选择——之间的平衡在生物学中已得到充分确立,但在其他领域,这些过程的作用仍有待整合。尽管自达尔文时代以来,自然选择的概念就已应用于文化领域,但由于缺乏既定的选择测试方法以及大型文化数据集,要确切识别处于选择中的文化性状已被证明更具挑战性。然而,近年来,随着大型文化数据集的积累,从史前陶器到现代婴儿名字等许多文化特征已被证明根据中性理论进化。但越来越多来自文化过程的经验证据表明,仅中性理论无法解释数据的所有特征。因此,人们重新关注在漂变中是否存在选择。在这里,我们分析了一部分英语单词的频率,并确定频率变化是否揭示了选择过程。受群体遗传学中的莫兰模型和赖特 - 费希尔模型启发,我们开发了一个单词频率变化的中性模型,以评估语言数据何时似乎偏离中性进化。因此,我们的模型代表了语言领域中一种可能的“选择测试”。我们探讨了100多年来(1901 - 2008年)英语中单词使用的分布是如何变化的,这通过谷歌Ngram提供的已出版书籍中的词汇使用情况来体现。当将经验性的单词频率变化与我们的中性模型进行比较时,我们发现普遍且系统性地偏离了中性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验