MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, United Kingdom.
Genome Res. 2010 Oct;20(10):1335-43. doi: 10.1101/gr.108795.110. Epub 2010 Aug 6.
Despite the availability of dozens of animal genome sequences, two key questions remain unanswered: First, what fraction of any species' genome confers biological function, and second, are apparent differences in organismal complexity reflected in an objective measure of genomic complexity? Here, we address both questions by applying, across the mammalian phylogeny, an evolutionary model that estimates the amount of functional DNA that is shared between two species' genomes. Our main findings are, first, that as the divergence between mammalian species increases, the predicted amount of pairwise shared functional sequence drops off dramatically. We show by simulations that this is not an artifact of the method, but rather indicates that functional (and mostly noncoding) sequence is turning over at a very high rate. We estimate that between 200 and 300 Mb (∼6.5%-10%) of the human genome is under functional constraint, which includes five to eight times as many constrained noncoding bases than bases that code for protein. In contrast, in D. melanogaster we estimate only 56-66 Mb to be constrained, implying a ratio of noncoding to coding constrained bases of about 2. This suggests that, rather than genome size or protein-coding gene complement, it is the number of functional bases that might best mirror our naïve preconceptions of organismal complexity.
尽管有数十种动物基因组序列可供使用,但仍有两个关键问题尚未得到解答:首先,任何物种基因组中赋予生物功能的部分占多大比例,其次,生物体复杂性的明显差异是否反映在基因组复杂性的客观衡量标准中?在这里,我们通过在哺乳动物进化枝上应用一种进化模型来解决这两个问题,该模型估计了两个物种基因组之间共享的功能 DNA 的数量。我们的主要发现是,首先,随着哺乳动物物种之间的分化增加,预测的成对共享功能序列数量急剧下降。我们通过模拟表明,这不是方法的一种假象,而是表明功能(主要是非编码)序列的周转率非常高。我们估计人类基因组中有 200 到 300Mb(约 6.5%-10%)受到功能约束,其中受约束的非编码碱基是编码蛋白质的碱基的五到八倍。相比之下,我们估计在 D. melanogaster 中只有 56-66Mb 受到约束,这意味着非编码约束碱基与编码约束碱基的比例约为 2。这表明,与其说是基因组大小或蛋白质编码基因的补充,不如说是功能碱基的数量最能反映我们对生物体复杂性的朴素先入之见。