Méthodes et Algorithmes pour la Bioinformatique, LIRMM, Centre National de la Recherche Scientifique, Université de Montpellier, Montpellier Cedex 5, France.
Syst Biol. 2010 May;59(3):307-21. doi: 10.1093/sysbio/syq010. Epub 2010 Mar 29.
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
PhyML 是一款基于最大似然原理的系统发生软件。早期的 PhyML 版本使用一种快速算法执行最近邻交换,以改进合理的起始树拓扑结构。自原始出版物(Guindon S.,Gascuel O. 2003. 一种简单、快速、准确的算法,通过最大似然法估计大型系统发育树。系统生物学 52:696-704)以来,由于其简单性以及准确性和速度之间的公平折衷,PhyML 得到了广泛应用(ISI Web of Science 中有超过 2500 次引用)。同时,围绕 PhyML 的研究仍在继续,本文介绍了该程序中实现的新算法和方法。首先,我们引入了一种新算法,使用子树剪枝和重新连接拓扑移动,以用户定义的强度搜索树空间。这里使用简约准则来过滤掉与似然函数相比最不有前途的拓扑修改。对各种大小的真实核苷酸和氨基酸数据集的大量分析表明了该方法的良好性能。其次,我们描述了一种新的测试方法来评估数据对系统发育内部分支的支持。这种方法扩展了最近提出的近似似然比检验,并依赖于非参数、Shimodaira-Hasegawa 样程序。对真实比对的详细分析揭示了这种新方法与更经典的非参数自举方法之间的联系。总体而言,我们的测试表明 PhyML 的最新版本(3.0)快速、准确、稳定且易于使用。Web 服务器和二进制文件可从 http://www.atgc-montpellier.fr/phyml/ 获得。