Department of Biology, Stanford University, Stanford, CA, 94305, USA.
Department of Mathematics, University of Pisa, 56126, Pisa, Italy.
Bull Math Biol. 2019 Feb;81(2):452-493. doi: 10.1007/s11538-018-0444-0. Epub 2018 Jun 6.
The neighbor-joining algorithm for phylogenetic inference (NJ) has been seen to have three specific properties when applied to distance matrices that contain an admixed taxon: (1) antecedence of clustering, in which the admixed taxon agglomerates with one of its source taxa before the two source taxa agglomerate with each other; (2) intermediacy of distances, in which the distance on an inferred NJ tree between an admixed taxon and either of its source taxa is smaller than the distance between the two source taxa; and (3) intermediacy of path lengths, in which the number of edges separating the admixed taxon and either of its source taxa is less than or equal to the number of edges between the source taxa. We examine the behavior of neighbor-joining on distance matrices containing an admixed group, investigating the occurrence of antecedence of clustering, intermediacy of distances, and intermediacy of path lengths. We first mathematically predict the frequency with which the properties are satisfied for a labeled unrooted binary tree selected uniformly at random in the absence of admixture. We then introduce a taxon constructed by a linear admixture of distances from two source taxa, examining three admixture scenarios by simulation: a model in which distance matrices are chosen at random, a model in which an admixed taxon is added to a set of taxa that reflect treelike evolution, and a model that introduces a perturbation of the treelike scenario. In contrast to previous conjectures, we observe that the three properties are sometimes violated by distance matrices that include an admixed taxon. However, we also find that they are satisfied more often than is expected by chance when the distance matrix contains an admixed taxon, especially when evolution among the non-admixed taxa is treelike. The results contribute to a deeper understanding of the nature of evolutionary trees constructed from data that do not necessarily reflect a treelike evolutionary process.
邻接法(NJ)用于系统发育推断时,当应用于包含混合分类单元的距离矩阵时有三个特定属性:(1)聚类优先性,即混合分类单元与其中一个来源分类单元聚在一起,然后这两个来源分类单元彼此聚在一起;(2)距离中介性,即在推断的 NJ 树中,混合分类单元与其来源分类单元之一之间的距离小于两个来源分类单元之间的距离;(3)路径长度中介性,即混合分类单元与其来源分类单元之一之间的边缘数小于或等于来源分类单元之间的边缘数。我们检查了包含混合群的距离矩阵上的邻接法的行为,研究了聚类优先性、距离中介性和路径长度中介性的发生情况。我们首先从不存在混合的情况下随机选择的无根二叉树的均匀标签中,从数学上预测这些属性满足的频率。然后,我们引入了由两个来源分类单元的距离的线性混合构成的分类单元,通过模拟检查三种混合情况:一种是随机选择距离矩阵的模型,另一种是将混合分类单元添加到反映树状进化的分类单元集合中的模型,以及一种引入树状场景扰动的模型。与之前的推测相反,我们观察到包含混合分类单元的距离矩阵有时会违反这三个属性。然而,我们还发现,当距离矩阵包含混合分类单元时,它们比随机情况下更经常满足,尤其是当非混合分类单元之间的进化是树状的时。这些结果有助于更深入地了解从不一定反映树状进化过程的数据构建的进化树的性质。