Leiden University Centre for Linguistics, Leiden University, Postbus 9515, Leiden 2300 RA, The Netherlands.
Laboratory for Quantitative Linguistics, Kazan Federal University, Kremlevskaya Street 18, Kazan 420000, Russia.
Philos Trans R Soc Lond B Biol Sci. 2021 May 10;376(1824):20200202. doi: 10.1098/rstb.2020.0202. Epub 2021 Mar 22.
Two families of quantitative methods have been used to infer geographical homelands of language families: Bayesian phylogeography and the 'diversity method'. Bayesian methods model how populations may have moved using a phylogenetic tree as a backbone, while the diversity method assumes that the geographical area where linguistic diversity is highest likely corresponds to the homeland. No systematic tests of the performances of the different methods in a linguistic context have so far been published. Here, we carry out performance testing by simulating language families, including branching structures and word lists, along with speaker populations moving in space. We test six different methods: two versions of BayesTraits; the relaxed random walk model of BEAST 2; our own RevBayes implementations of a fixed rate and a variable rates random walk model; and the diversity method. As a result of the tests, we propose a hierarchy of performance of the different methods. Factors such as geographical idiosyncrasies, incomplete sampling, tree imbalance and small family sizes all have a negative impact on performance, but mostly across the board, the performance hierarchy generally being impervious to such factors. This article is part of the theme issue 'Reconstructing prehistoric languages'.
贝叶斯系统地理学和“多样性方法”。贝叶斯方法通过使用系统发育树作为主干来模拟种群可能的迁移方式,而多样性方法则假设语言多样性最高的地理区域可能对应于起源地。迄今为止,尚未在语言学背景下对不同方法的性能进行系统测试。在这里,我们通过模拟语言家族,包括分支结构和单词列表,以及在空间中移动的说话人群,进行性能测试。我们测试了六种不同的方法:两种版本的贝叶斯特征追踪法;BEAST2 的松弛随机漫步模型;我们自己的 RevBayes 实现的固定速率和可变速率随机漫步模型;以及多样性方法。作为测试的结果,我们提出了不同方法性能的层次结构。地理特征、不完全采样、树不平衡和小家族规模等因素都对性能有负面影响,但总体而言,性能层次结构通常不受这些因素的影响。本文是“重建史前语言”主题的一部分。