Rama Taraka, Wichmann Søren
Department of Linguistics, University of North Texas, Denton, Texas, United States of America.
Leiden University Centre for Linguistics, University of Leiden, Leiden, Netherlands.
PLoS One. 2020 Aug 12;15(8):e0236522. doi: 10.1371/journal.pone.0236522. eCollection 2020.
In current practice, when dating the root of a Bayesian language phylogeny the researcher is required to supply some of the information beforehand, including a distribution of root ages and dates for some nodes serving as calibration points. In addition to the potential subjectivity that this leaves room for, the problem arises that for many of the language families of the world there are no available internal calibration points. Here we address the following questions: Can a new Bayesian framework which overcomes these problems be introduced and how well does it perform? The new framework that we present is generalized in the sense that no family-specific priors or calibration points are needed. We moreover introduce a way to overcome another potential source of subjectivity in Bayesian tree inference as commonly practiced, namely that of manual cognate identification; instead, we apply an automated approach. Dates are obtained by fitting a Gamma regression model to tree lengths and known time depths for 30 phylogenetically independent calibration points. This model is used to predict the time depths of both the root and the internal nodes for 116 language families, producing a total of 1,287 dates for families and subgroups. It turns out that results are similar to those of published Bayesian studies of individual language families. The performance of the method is compared to automated glottochronology, which is an update of the classical method of Swadesh drawing upon automated cognate recognition and a new formula for deriving a time depth from percentages of shared cognates. It is also compared to a third dating method, that of the Automated Similarity Judgment Program (ASJP). In terms of errors and correlations with known dates, ASJP works better than the new method and both work better than automated glottochronology.
在当前的实践中,在确定贝叶斯语言系统发育树的根节点时间时,研究人员需要预先提供一些信息,包括根节点年龄的分布以及一些用作校准点的节点的时间。除了由此留下的潜在主观性之外,还出现了一个问题,即对于世界上许多语系来说,没有可用的内部校准点。在这里,我们解决以下问题:能否引入一个克服这些问题的新贝叶斯框架,以及它的表现如何?我们提出的新框架具有一般性,即不需要特定语系的先验信息或校准点。此外,我们还引入了一种方法来克服贝叶斯树推断中常见的另一个潜在主观性来源,即手动确定同源词;相反,我们采用一种自动化方法。通过将伽马回归模型拟合到30个系统发育独立校准点的树长和已知时间深度来获得时间。该模型用于预测116个语系的根节点和内部节点的时间深度,为语系和子语系总共生成1287个时间。结果表明,这些结果与已发表的关于单个语系的贝叶斯研究结果相似。将该方法的性能与自动语言年代学进行了比较,自动语言年代学是对斯瓦迪士经典方法的更新,它利用自动同源词识别和一个从共享同源词百分比推导时间深度的新公式。它还与第三种年代测定方法,即自动相似性判断程序(ASJP)进行了比较。在误差和与已知时间的相关性方面,ASJP比新方法表现更好,且两者都比自动语言年代学表现更好。