Nekrutenko A, Li W H
Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA.
Genome Res. 2000 Dec;10(12):1986-95. doi: 10.1101/gr.10.12.1986.
Using large amounts of long genomic sequences, we studied the compositional patterns of eukaryotic genomes. We developed a simple measure, the compositional heterogeneity (or variability) index, to compare the differences in compositional heterogeneity between long genomic sequences. The index measures the average difference in GC content between two adjacent windows normalized by the standard error expected under the assumption of random distribution of nucleotides in a window. We report the following findings: (1) The extent of the compositional heterogeneity in a genomic sequence strongly correlates with its GC content in all multicellular eukaryotes studied regardless of genome size. (2) The human genome appears to be highly compositionally heterogeneous both within and between individual chromosomes; the heterogeneity goes much beyond the predictions of the isochore model. (3) All genomes of multicellular eukaryotes examined in this study are compositionally heterogeneous, although they also contain compositionally uniform segments, or isochores. (4) The true uniqueness of the human (or mammalian) genome is the presence of very high GC regions, which exhibit unusually high compositional heterogeneity and contain few long homogeneous segments (isochores). In general, GC-poor isochores tend to be longer than GC-rich ones. These findings indicate that the genomes of multicellular organisms are much more heterogeneous in nucleotide composition than depicted by the isochore model and so lead to a looser definition of isochores.
我们使用大量长基因组序列研究了真核生物基因组的组成模式。我们开发了一种简单的度量方法,即组成异质性(或变异性)指数,以比较长基因组序列之间组成异质性的差异。该指数衡量两个相邻窗口之间GC含量的平均差异,并通过假设窗口中核苷酸随机分布时预期的标准误差进行归一化。我们报告了以下发现:(1)在所研究的所有多细胞真核生物中,基因组序列的组成异质性程度与其GC含量密切相关,与基因组大小无关。(2)人类基因组在单个染色体内外似乎都具有高度的组成异质性;这种异质性远远超出了等容线模型的预测。(3)本研究中检测的所有多细胞真核生物基因组都是组成异质的,尽管它们也包含组成均匀的片段,即等容线。(4)人类(或哺乳动物)基因组真正独特之处在于存在非常高的GC区域,这些区域表现出异常高的组成异质性,并且几乎没有长的均匀片段(等容线)。一般来说,GC含量低的等容线往往比GC含量高的等容线更长。这些发现表明,多细胞生物的基因组在核苷酸组成上比等容线模型所描述的更加异质,因此对等容线的定义更加宽松。