Rocha Eduardo P C
Unité Génétique des Génomes Bactériens, Institut Pasteur, Paris, France and Atelier de BioInformatique, Université Pierre et Marie Curie (Paris VI), Paris, France.
Mol Biol Evol. 2006 Mar;23(3):513-22. doi: 10.1093/molbev/msj052. Epub 2005 Nov 9.
The stability of genomes is highly variable, both in terms of gene content and gene order. Here I calibrate the loss of gene order conservation (GOC) through time by fitting a simple probabilistic model on pairwise comparisons involving 126 bacterial genomes. The model computes the probability of separation of pairs of contiguous genes per unit of time and fits the data better than previous ones while allowing a mechanistic interpretation for the loss of GOC with time. Although the information on operons is not used in the model, I observe, as expected, that most highly conserved pairs of genes are indeed within operons. However, even the other pairs are much more conserved than expected given the observed experimental rearrangement rates. After 500 Myr, about 50% of the originally contiguous orthologues remain so in the average genome. Hence, the large majority of rearrangements must be deleterious and random genome rearrangements are unlikely to provide for positively selected structural changes. I then use the deviations from the model to define an intrinsic measure of genome stability that allowed the comparison of distantly related genomes and the inference of ancestral states. This shows that clades differ in genome stability, with cyanobacteria being the least stable and gamma-proteobacteria the most stable. Without correction for phylogeny, free-living bacteria are the least stable group of genomes, followed by pathogens, and then endomutualists. However, after correction for phylogenetic inertia (or the removal of cyanobacteria from the analysis), there is no significant association between genome stability and lifestyle or genome size. Hence, although this method has allowed uncovering some of mechanisms leading to rearrangements, we still ignore the forces that differentially shape selection upon genome stability in different species.
基因组的稳定性在基因内容和基因顺序方面都具有高度变异性。在此,我通过对涉及126个细菌基因组的成对比较拟合一个简单的概率模型,来校准基因顺序保守性(GOC)随时间的丧失情况。该模型计算每单位时间相邻基因对分离的概率,并且比先前的模型更能拟合数据,同时还能对GOC随时间的丧失给出一个机制性解释。尽管该模型未使用操纵子的信息,但正如预期的那样,我观察到大多数高度保守的基因对确实存在于操纵子内。然而,即便其他基因对,考虑到观察到的实验重排率,其保守程度也比预期的要高得多。5亿年后,平均基因组中约50%最初相邻的直系同源基因仍然相邻。因此,绝大多数重排必定是有害的,随机的基因组重排不太可能产生正选择的结构变化。然后,我利用与模型的偏差来定义一种基因组稳定性的内在度量,这使得能够比较远缘相关的基因组并推断祖先状态。这表明不同进化枝在基因组稳定性方面存在差异,蓝细菌最不稳定,γ-变形菌最稳定。在不校正系统发育的情况下,自由生活细菌的基因组是最不稳定的群体,其次是病原体,然后是内共生菌。然而,在校正系统发育惯性(或从分析中去除蓝细菌)之后,基因组稳定性与生活方式或基因组大小之间没有显著关联。因此,尽管这种方法有助于揭示一些导致重排的机制,但我们仍然不清楚在不同物种中对基因组稳定性产生不同影响的选择力量。