Univ Montpellier2, UMR AMAP, Montpellier F-34000, France.
Bioinformatics. 2012 Sep 15;28(18):i382-i388. doi: 10.1093/bioinformatics/bts374.
Most models of genome evolution integrating gene duplications, losses and chromosomal rearrangements are computationally intract able, even when comparing only two genomes. This prevents large-scale studies that consider different types of genome structural variations.
We define an 'adjacency phylogenetic tree' that describes the evolution of an adjacency, a neighborhood relation between two genes, by speciation, duplication or loss of one or both genes, and rearrangement. We describe an algorithm that, given a species tree and a set of gene trees where the leaves are connected by adjacencies, computes an adjacency forest that minimizes the number of gains and breakages of adjacencies (caused by rearrangements) and runs in polynomial time. We use this algorithm to reconstruct contiguous regions of mammalian and plant ancestral genomes in a few minutes for a dozen species and several thousand genes. We show that this method yields reduced conflict between ancestral adjacencies. We detect duplications involving several genes and compare the different modes of evolution between phyla and among lineages.
C++ implementation using BIO++ package, available upon request to Sèverine Bérard.
Severine.Berard@cirad.fr or Eric.Tannier@inria.fr
Supplementary material is available at Bioinformatics online.
大多数整合基因复制、缺失和染色体重排的基因组进化模型在计算上都难以处理,即使只比较两个基因组也是如此。这阻碍了对不同类型基因组结构变异的大规模研究。
我们定义了一个“邻接系统发生树”,通过物种形成、一个或两个基因的复制或缺失以及重排来描述两个基因之间邻接关系的进化。我们描述了一种算法,给定一个物种树和一组基因树,其中叶子通过邻接连接,该算法计算出一个邻接森林,该森林最小化了邻接(由重排引起)的增益和断裂的数量,并在多项式时间内运行。我们使用此算法在几分钟内为十几个物种和数千个基因重建了哺乳动物和植物祖先基因组的连续区域。我们表明,该方法减少了祖先邻接之间的冲突。我们检测涉及多个基因的复制,并比较门之间和谱系之间不同的进化模式。
C++ 实现使用 BIO++ 包,可根据请求提供给 Sèverine Bérard。
Severine.Berard@cirad.fr 或 Eric.Tannier@inria.fr
补充材料可在Bioinformatics 在线获得。