Antezana Marcos A, Jordan I King
Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America.
PLoS One. 2008 May 14;3(5):e2145. doi: 10.1371/journal.pone.0002145.
The content of guanine+cytosine varies markedly along the chromosomes of homeotherms and great effort has been devoted to studying this heterogeneity and its biological implications. Already before the DNA-sequencing era, however, it was established that the dinucleotides in the DNA of mammals in particular, and of most organisms in general, show striking over- and under-representations that cannot be explained by the base composition. Here we show that in the coding regions of vertebrates both GC content and codon occurrences are strongly correlated with such "motif preferences" even though we quantify the latter using an index that is not affected by the base composition, codon usage, and protein-sequence encoding. These correlations are likely to be the result of the long-term shaping of the primary structure of genic and non-genic DNA by a regime of mutation of which central features have been maintained by natural selection. We find indeed that these preferences are conserved in vertebrates even more rigidly than codon occurrences and we show that the occurrence-preference correlations are stronger in intronic and non-genic DNA, with the R(2)s reaching 99% when GC content is approximately 0.5. The mutation regime appears to be characterized by rates that depend markedly on the bases present at the site preceding and at that following each mutating site, because when we estimate such rates of neighbor-base-dependent mutation (NBDM) from substitutions retrieved from alignments of coding, intronic, and non-genic mammalian DNA sorted and grouped by GC content, they suffice to simulate DNA sequences in which motif occurrences and preferences as well as the correlations of motif preferences with GC content and with motif occurrences, are very similar to the mammalian ones. The best fit, however, is obtained with NBDM regimes lacking strand effects, which indicates that over the long term NBDM switches strands in the germline as one would expect for effects due to loosely contained background transcription. Finally, we show that human coding regions are less mutable under the estimated NBDM regimes than under matched context-independent mutation and that this entails marked differences between the spectra of amino-acid mutations that either mutation regime should generate. In the Discussion we examine the mechanisms likely to underlie NBDM heterogeneity along chromosomes and propose that it reflects how the diversity and activity of lesion-bypass polymerases (LBPs) track the landscapes of scheduled and non-scheduled genome repair, replication, and transcription during the cell cycle. We conclude that the primary structure of vertebrate genic DNA at and below the trinucleotide level has been governed over the long term by highly conserved regimes of NBDM which should be under direct natural selection because they alter drastically missense-mutation rates and hence the somatic and the germline mutational loads. Therefore, the non-coding DNA of vertebrates may have been shaped by NBDM only epiphenomenally, with non-genic DNA being affected mainly when found in the proximity of genes.
恒温动物染色体中鸟嘌呤 + 胞嘧啶的含量变化显著,人们付出了巨大努力来研究这种异质性及其生物学意义。然而,早在DNA测序时代之前就已确定,特别是哺乳动物的DNA以及大多数生物的DNA中的二核苷酸,呈现出显著的过度和不足代表性,这无法用碱基组成来解释。在这里,我们表明,在脊椎动物的编码区域中,GC含量和密码子出现频率都与这种“基序偏好”密切相关,尽管我们使用的是一个不受碱基组成、密码子使用和蛋白质序列编码影响的指标来量化后者。这些相关性可能是由于一种突变机制长期塑造基因和非基因DNA的一级结构的结果,这种突变机制的核心特征通过自然选择得以维持。我们确实发现,这些偏好在脊椎动物中比密码子出现频率更严格地保守,并且我们表明,内含子和非基因DNA中的出现 - 偏好相关性更强,当GC含量约为0.5时,R²值达到99%。突变机制似乎具有这样的特征,即其速率明显取决于每个突变位点之前和之后位点上存在的碱基,因为当我们从按GC含量分类和分组的编码、内含子和非基因哺乳动物DNA比对中检索到的替换来估计这种邻碱基依赖突变(NBDM)的速率时,它们足以模拟出其中基序出现频率和偏好以及基序偏好与GC含量和基序出现频率之间的相关性与哺乳动物非常相似的DNA序列。然而,最佳拟合是通过缺乏链效应的NBDM机制获得的,这表明从长期来看,NBDM在种系中会转换链,正如人们预期的由于松散包含的背景转录所产生的效应那样。最后,我们表明,在估计的NBDM机制下,人类编码区域比在匹配的上下文无关突变下更不易突变,这导致两种突变机制应该产生的氨基酸突变谱之间存在显著差异。在讨论中,我们研究了可能是染色体上NBDM异质性基础的机制,并提出它反映了损伤旁路聚合酶(LBP)的多样性和活性如何在细胞周期中跟踪预定和非预定的基因组修复、复制和转录的景观。我们得出结论,脊椎动物基因DNA在三核苷酸及以下水平的一级结构长期以来一直受高度保守的NBDM机制支配,这种机制应该受到直接的自然选择,因为它们会极大地改变错义突变率,从而改变体细胞和种系的突变负荷。因此,脊椎动物的非编码DNA可能只是附带地受到NBDM的影响,非基因DNA主要在靠近基因的位置时才会受到影响。