Huynen M A, Bork P
European Molecular Biology Laboratory, Meyerhofstrasse 1, 69012 Heidelberg, Germany, and Max-Delbrück-Centrum for Molecular Medicine, 13122 Berlin-Buch, Germany.
Proc Natl Acad Sci U S A. 1998 May 26;95(11):5849-56. doi: 10.1073/pnas.95.11.5849.
The determination of complete genome sequences provides us with an opportunity to describe and analyze evolution at the comprehensive level of genomes. Here we compare nine genomes with respect to their protein coding genes at two levels: (i) we compare genomes as "bags of genes" and measure the fraction of orthologs shared between genomes and (ii) we quantify correlations between genes with respect to their relative positions in genomes. Distances between the genomes are related to their divergence times, measured as the number of amino acid substitutions per site in a set of 34 orthologous genes that are shared among all the genomes compared. We establish a hierarchy of rates at which genomes have changed during evolution. Protein sequence identity is the most conserved, followed by the complement of genes within the genome. Next is the degree of conservation of the order of genes, whereas gene regulation appears to evolve at the highest rate. Finally, we show that some genomes are more highly organized than others: they show a higher degree of the clustering of genes that have orthologs in other genomes.
完整基因组序列的测定为我们提供了一个在基因组综合层面描述和分析进化的机会。在此,我们从两个层面比较九个基因组的蛋白质编码基因:(i)我们将基因组视为“基因袋”进行比较,并测量基因组之间共享的直系同源基因的比例;(ii)我们量化基因在基因组中相对位置的相关性。基因组之间的距离与它们的分歧时间相关,分歧时间以一组在所有比较的基因组中都存在的34个直系同源基因中每个位点的氨基酸替换数来衡量。我们建立了基因组在进化过程中发生变化的速率层次结构。蛋白质序列同一性是最保守的,其次是基因组内基因的互补性。接下来是基因顺序的保守程度,而基因调控似乎以最高速率进化。最后,我们表明一些基因组比其他基因组组织得更高度有序:它们在其他基因组中具有直系同源基因的基因聚类程度更高。