RoyChoudhury Arindam, Felsenstein Joseph, Thompson Elizabeth A
Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
Genetics. 2008 Oct;180(2):1095-105. doi: 10.1534/genetics.107.085753. Epub 2008 Sep 9.
We have developed a pruning algorithm for likelihood estimation of a tree of populations. This algorithm enables us to compute the likelihood for large trees. Thus, it gives an efficient way of obtaining the maximum-likelihood estimate (MLE) for a given tree topology. Our method utilizes the differences accumulated by random genetic drift in allele count data from single-nucleotide polymorphisms (SNPs), ignoring the effect of mutation after divergence from the common ancestral population. The computation of the maximum-likelihood tree involves both maximizing likelihood over branch lengths of a given topology and comparing the maximum-likelihood across topologies. Here our focus is the maximization of likelihood over branch lengths of a given topology. The pruning algorithm computes arrays of probabilities at the root of the tree from the data at the tips of the tree; at the root, the arrays determine the likelihood. The arrays consist of probabilities related to the number of coalescences and allele counts for the partially coalesced lineages. Computing these probabilities requires an unusual two-stage algorithm. Our computation is exact and avoids time-consuming Monte Carlo methods. We can also correct for ascertainment bias.
我们开发了一种用于估计种群树似然性的剪枝算法。该算法使我们能够计算大型树的似然性。因此,它提供了一种有效方法来获得给定树拓扑结构的最大似然估计(MLE)。我们的方法利用了单核苷酸多态性(SNP)等位基因计数数据中随机遗传漂变积累的差异,忽略了从共同祖先种群分化后突变的影响。最大似然树的计算既涉及在给定拓扑结构的分支长度上最大化似然性,也涉及比较不同拓扑结构的最大似然性。这里我们关注的是在给定拓扑结构的分支长度上最大化似然性。剪枝算法根据树末端的数据计算树根部的概率数组;在根部,这些数组确定似然性。这些数组由与部分合并谱系的合并次数和等位基因计数相关的概率组成。计算这些概率需要一种不同寻常的两阶段算法。我们的计算是精确的,避免了耗时的蒙特卡罗方法。我们还可以校正确定偏差。