School of Computer Sciences, Tel Aviv University, Ramat Aviv, Israel.
Genome Res. 2010 Jan;20(1):122-32. doi: 10.1101/gr.096115.109. Epub 2009 Nov 30.
Inferring the gene content of ancestral genomes is a fundamental challenge in molecular evolution. Due to the statistical nature of this problem, ancestral genomes inferred by the maximum likelihood (ML) or the maximum-parsimony (MP) methods are prone to considerable error rates. In general, these errors are difficult to abolish by using longer genomic sequences or by analyzing more taxa. This study describes a new approach for improving ancestral genome reconstruction, the ancestral coevolver (ACE), which utilizes coevolutionary information to improve the accuracy of such reconstructions over previous approaches. The principal idea is to reduce the potentially large solution space by choosing a single optimal (or near optimal) solution that is in accord with the coevolutionary relationships between protein families. Simulation experiments, both on artificial and real biological data, show that ACE yields a marked decrease in error rate compared with ML or MP. Applied to a large data set (95 organisms, 4873 protein families, and 10,000 coevolutionary relationships), some of the ancestral genomes reconstructed by ACE were remarkably different in their gene content from those reconstructed by ML or MP alone (more than 10% in some nodes). These reconstructions, while having almost similar likelihood/parsimony scores as those obtained with ML/MP, had markedly higher concordance with the coevolutionary information. Specifically, when ACE was implemented to improve the results of ML, it added a large number of proteins to those encoded by LUCA (last universal common ancestor), most of them ribosomal proteins and components of the F(0)F(1)-type ATP synthase/ATPases, complexes that are vital in most living organisms. Our analysis suggests that LUCA appears to have been bacterial-like and had a genome size similar to the genome sizes of many extant organisms.
推断远古基因组的基因内容是分子进化中的一个基本挑战。由于这个问题的统计性质,最大似然法(ML)或最大简约法(MP)推断的远古基因组容易出现相当高的错误率。一般来说,通过使用更长的基因组序列或分析更多的分类群,这些错误很难消除。本研究描述了一种改进远古基因组重建的新方法,即祖先共进化者(ACE),它利用共进化信息来提高这些重建的准确性,优于以前的方法。其主要思想是通过选择与蛋白质家族之间的共进化关系一致的单个最优(或接近最优)解决方案来缩小潜在的庞大解决方案空间。基于人工和真实生物数据的模拟实验表明,ACE 与 ML 或 MP 相比,显著降低了错误率。应用于一个大型数据集(95 个生物体、4873 个蛋白质家族和 10000 个共进化关系),ACE 重建的一些远古基因组在基因内容上与 ML 或 MP 单独重建的基因组有明显的不同(在一些节点上超过 10%)。这些重建虽然与 ML/MP 获得的似然/简约得分几乎相同,但与共进化信息的一致性明显更高。具体来说,当 ACE 被用来改进 ML 的结果时,它在 LUCA(最后普遍共同祖先)编码的蛋白质中添加了大量蛋白质,其中大部分是核糖体蛋白和 F(0)F(1)-型 ATP 合酶/ATP 酶的组成部分,这些都是大多数活细胞的重要组成部分。我们的分析表明,LUCA 似乎具有细菌样特征,其基因组大小与许多现存生物的基因组大小相似。