Department of Electrical Engineering and Computer Science, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Syst Biol. 2013 Jan 1;62(1):110-20. doi: 10.1093/sysbio/sys076. Epub 2012 Sep 4.
Accurate gene tree reconstruction is a fundamental problem in phylogenetics, with many important applications. However, sequence data alone often lack enough information to confidently support one gene tree topology over many competing alternatives. Here, we present a novel framework for combining sequence data and species tree information, and we describe an implementation of this framework in TreeFix, a new phylogenetic program for improving gene tree reconstructions. Given a gene tree (preferably computed using a maximum-likelihood phylogenetic program), TreeFix finds a "statistically equivalent" gene tree that minimizes a species tree-based cost function. We have applied TreeFix to 2 clades of 12 Drosophila and 16 fungal genomes, as well as to simulated phylogenies and show that it dramatically improves reconstructions compared with current state-of-the-art programs. Given its accuracy, speed, and simplicity, TreeFix should be applicable to a wide range of analyses and have many important implications for future investigations of gene evolution. The source code and a sample data set are available at http://compbio.mit.edu/treefix.
准确的基因树重建是系统发育学中的一个基本问题,具有许多重要的应用。然而,仅序列数据往往缺乏足够的信息来自信地支持一个基因树拓扑结构相对于许多竞争的替代方案。在这里,我们提出了一个结合序列数据和物种树信息的新框架,并在 TreeFix 中描述了这个框架的一个实现,TreeFix 是一个用于改进基因树重建的新的系统发育程序。给定一个基因树(最好使用最大似然系统发育程序计算),TreeFix 会找到一个“统计等效”的基因树,该基因树最小化基于物种树的代价函数。我们已经将 TreeFix 应用于 12 个果蝇和 16 个真菌基因组的 2 个分支,以及模拟的系统发育树,并表明它与当前最先进的程序相比显著改善了重建。鉴于其准确性、速度和简单性,TreeFix 应该适用于广泛的分析,并对未来的基因进化研究具有许多重要的意义。源代码和一个示例数据集可在 http://compbio.mit.edu/treefix 获得。