Grishin N V, Wolf Y I, Koonin E V
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA.
Genome Res. 2000 Jul;10(7):991-1000. doi: 10.1101/gr.10.7.991.
Accumulation of complete genome sequences of diverse organisms creates new possibilities for evolutionary inferences from whole-genome comparisons. In the present study, we analyze the distributions of substitution rates among proteins encoded in 19 complete genomes (the interprotein rate distribution). To estimate these rates, it is necessary to employ another fundamental distribution, that of the substitution rates among sites in proteins (the intraprotein distribution). Using two independent approaches, we show that intraprotein substitution rate variability appears to be significantly greater than generally accepted. This yields more realistic estimates of evolutionary distances from amino-acid sequences, which is critical for evolutionary-tree construction. We demonstrate that the interprotein rate distributions inferred from the genome-to-genome comparisons are similar to each other and can be approximated by a single distribution with a long exponential shoulder. This suggests that a generalized version of the molecular clock hypothesis may be valid on genome scale. We also use the scaling parameter of the obtained interprotein rate distribution to construct a rooted whole-genome phylogeny. The topology of the resulting tree is largely compatible with those of global rRNA-based trees and trees produced by other approaches to genome-wide comparison.
不同生物体完整基因组序列的积累为通过全基因组比较进行进化推断创造了新的可能性。在本研究中,我们分析了19个完整基因组中编码蛋白质的替换率分布(蛋白质间速率分布)。为了估计这些速率,有必要采用另一种基本分布,即蛋白质中位点间的替换率分布(蛋白质内分布)。使用两种独立的方法,我们表明蛋白质内替换率变异性似乎显著大于普遍接受的程度。这使得从氨基酸序列得出的进化距离估计更加现实,这对于构建进化树至关重要。我们证明,从基因组对基因组比较推断出的蛋白质间速率分布彼此相似,并且可以由具有长指数尾部的单一分布近似。这表明分子钟假说的广义版本在基因组规模上可能是有效的。我们还使用获得的蛋白质间速率分布的缩放参数来构建有根的全基因组系统发育树。所得树的拓扑结构在很大程度上与基于全局rRNA的树以及通过其他全基因组比较方法产生的树的拓扑结构兼容。