Zeeberg Barry
Laboratory of Molecular Pharmacology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
Genome Res. 2002 Jun;12(6):944-55. doi: 10.1101/gr.213402.
Exonic GC of human mRNA reference sequences (RefSeqs), as well as A, C, G, and T in codon position 3 are linearly correlated with genomic GC. These observations utilize information from the completed human genome sequence and a large, high-quality set of human and mouse coding sequences, and are in accord with similar determinations published by others. A Shannon Information Theoretic measure of bias in synonymous codon usage was developed. When applied to either human or mouse RefSeqs, this measure is nonlinearly correlated with genomic, exonic, and third codon position A, C, G, and T. Information values between orthologous mouse and human RefSeqs are linearly correlated: mouse = 0.092 + 0.55 human. Mouse genes were consistently placed in genomic regions whose GC content was closer to 50% than was the GC content of the human ortholog. Since the (nonlinear) information versus percent GC curve has a minimum at 50% GC and monotonically increases with increasing distance from 50% GC, this phenomenon directly results in the low slope of 0.55. This appears to be a manifestation of an evolutionary strategy for placement of genes in regions of the genome with a GC content that relates synonymous codon bias and protein folding.
人类mRNA参考序列(RefSeqs)的外显子GC以及密码子第3位的A、C、G和T与基因组GC呈线性相关。这些观察结果利用了来自已完成的人类基因组序列以及大量高质量的人类和小鼠编码序列的信息,并且与其他人发表的类似测定结果一致。开发了一种关于同义密码子使用偏差的香农信息理论度量。当应用于人类或小鼠RefSeqs时,该度量与基因组、外显子和第三密码子位置的A、C、G和T呈非线性相关。直系同源小鼠和人类RefSeqs之间的信息值呈线性相关:小鼠 = 0.092 + 0.55×人类。小鼠基因始终位于基因组区域,其GC含量比人类直系同源基因的GC含量更接近50%。由于(非线性的)信息与GC百分比曲线在GC含量为50%时具有最小值,并且随着与50% GC距离的增加而单调增加,这种现象直接导致了0.55的低斜率。这似乎是一种进化策略的表现,即将基因放置在基因组中GC含量与同义密码子偏好和蛋白质折叠相关的区域。