Wapinski Ilan, Pfeffer Avi, Friedman Nir, Regev Aviv
Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Bioinformatics. 2007 Jul 1;23(13):i549-58. doi: 10.1093/bioinformatics/btm193.
Gene duplication and divergence is a major evolutionary force. Despite the growing number of fully sequenced genomes, methods for investigating these events on a genome-wide scale are still in their infancy. Here, we present SYNERGY, a novel and scalable algorithm that uses sequence similarity and a given species phylogeny to reconstruct the underlying evolutionary history of all genes in a large group of species. In doing so, SYNERGY resolves homology relations and accurately distinguishes orthologs from paralogs. We applied our approach to a set of nine fully sequenced fungal genomes spanning 150 million years, generating a genome-wide catalog of orthologous groups and corresponding gene trees. Our results are highly accurate when compared to a manually curated gold standard, and are robust to the quality of input according to a novel jackknife confidence scoring. The reconstructed gene trees provide a comprehensive view of gene evolution on a genomic scale. Our approach can be applied to any set of sequenced eukaryotic species with a known phylogeny, and opens the way to systematic studies of the evolution of individual genes, molecular systems and whole genomes.
Supplementary data are available at Bioinformatics online.
基因复制和分化是一种主要的进化力量。尽管已完全测序的基因组数量不断增加,但在全基因组范围内研究这些事件的方法仍处于起步阶段。在此,我们提出了SYNERGY,这是一种新颖且可扩展的算法,它利用序列相似性和给定的物种系统发育来重建一大组物种中所有基因的潜在进化历史。通过这样做,SYNERGY解决了同源关系,并准确地区分直系同源基因和旁系同源基因。我们将我们的方法应用于一组跨越1.5亿年的九个已完全测序的真菌基因组,生成了全基因组直系同源组目录和相应的基因树。与手动策划的黄金标准相比,我们的结果高度准确,并且根据一种新颖的刀切法置信度评分,对输入质量具有鲁棒性。重建的基因树提供了基因组规模上基因进化的全面视图。我们的方法可以应用于任何一组具有已知系统发育的已测序真核物种,并为系统研究单个基因、分子系统和全基因组的进化开辟了道路。
补充数据可在《生物信息学》在线获取。