Efromovich Sam, Kubatko Laura Salter
University of Texas at Dallas, USA.
Stat Appl Genet Mol Biol. 2008;7(1):Article2. doi: 10.2202/1544-6115.1319. Epub 2008 Jan 19.
The relationship between speciation times and the corresponding times of gene divergence is of interest in phylogenetic inference as a means of understanding the past evolutionary dynamics of populations and of estimating the timing of speciation events. It has long been recognized that gene divergence times might substantially pre-date speciation events. Although the distribution of the difference between these has previously been studied for the case of two populations, this distribution has not been explicitly computed for larger species phylogenies. Here we derive a simple method for computing this distribution for trees of arbitrary size. A two-stage procedure is proposed which (i) considers the probability distribution of the time from the speciation event at the root of the species tree to the gene coalescent time conditionally on the number of gene lineages available at the root; and (ii) calculates the probability mass function for the number of gene lineages at the root. This two-stage approach dramatically simplifies numerical analysis, because in the first step the conditional distribution does not depend on an underlying species tree, while in the second step the pattern of gene coalescence prior to the species tree root is irrelevant. In addition, the algorithm provides intuition concerning the properties of the distribution with respect to the various features of the underlying species tree. The methodology is complemented by developing probabilistic formulae and software, written in R. The method and software are tested on five-taxon species trees with varying levels of symmetry. The examples demonstrate that more symmetric species trees tend to have larger mean coalescent times and are more likely to have a unimodal gamma-like distribution with a long right tail, while asymmetric trees tend to have smaller mean coalescent times with an exponential-like distribution. In addition, species trees with longer branches generally have shorter mean coalescent times, with branches closest to the root of the tree being most influential.
在系统发育推断中,物种形成时间与相应的基因分歧时间之间的关系备受关注,它是理解种群过去进化动态以及估计物种形成事件时间的一种手段。长期以来,人们已经认识到基因分歧时间可能大大早于物种形成事件。尽管此前已经针对两个种群的情况研究了两者之间差异的分布,但尚未针对更大的物种系统发育明确计算这种分布。在此,我们推导了一种用于计算任意大小树的这种分布的简单方法。我们提出了一个两阶段程序,该程序(i)考虑从物种树根部的物种形成事件到基因合并时间的时间概率分布,条件是根部可用的基因谱系数量;(ii)计算根部基因谱系数量的概率质量函数。这种两阶段方法极大地简化了数值分析,因为在第一步中条件分布不依赖于基础物种树,而在第二步中物种树根部之前的基因合并模式无关紧要。此外,该算法提供了关于该分布相对于基础物种树各种特征的性质的直观理解。通过开发用R编写的概率公式和软件对该方法进行了补充。该方法和软件在具有不同对称程度的五分类单元物种树上进行了测试。这些例子表明,更对称的物种树往往具有更大的平均合并时间,并且更有可能具有单峰的类似伽马分布且右尾较长,而不对称树往往具有较小的平均合并时间且呈指数分布。此外,分支较长的物种树通常平均合并时间较短,最接近树根部的分支影响最大。