Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology.
Mol Biol Evol. 2011 Jan;28(1):273-90. doi: 10.1093/molbev/msq189. Epub 2010 Jul 25.
Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2-3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.
最近的测序和计算进展使系统发育分析能够扩展到整个基因组和大的进化枝,因此需要专门针对系统基因组学背景设计的更有效和准确的方法。在这里,我们提出了 SPIMAP,这是一种在已知物种树存在的情况下,用于重建基因树的高效贝叶斯方法。我们观察到通过对包括基因复制和丢失(DL)率、物种形成时间以及物种和基因座之间的相关替代率变化等多个进化方面进行建模,在重建准确性方面取得了许多改进。我们已经在两个完全测序的物种进化枝(12 个果蝇和 16 个真菌基因组)以及模拟的系统发育树上实现和应用了这种方法,与最流行的现有方法相比,包括那些考虑物种树的方法,重建准确性有了显著提高。我们发现传统系统发育方法的重建不准确会使 DL 事件的数量高估多达 2-3 倍,而我们的方法则实现了更高的准确性。我们认为,这里提出的结果和方法将对未来的基因进化研究有许多重要的启示。