Blake R D, Earley S
Department of Biochemistry, University of Maine, Orono 04469.
J Biomol Struct Dyn. 1986 Oct;4(2):291-307. doi: 10.1080/07391102.1986.10506347.
The mean (G + C) composition (51.0%) and standard deviation (+/- 3.8%) of published DNA sequences accounting for 10% of the E. coli genome is in excellent agreement with the principal overall distribution determined by high resolution melting. While differences in base and neighbor characteristics are small and uniform throughout all regions of the genome, it is found that the (G + C) content of sequences varies in segmented fashion within boundaries corresponding to coding (53% G + C) and noncoding (46% G + C) regions; with variances in the latter being six-fold greater than in coding regions. The variance in different regions shows a strong negative dependence on (G + C) content of the region, reflecting the condition that A-T and G-C base pairs are preferred neighbors of A-T and C-G pairs, respectively; with the bias increasing with decreasing (G + C) content. Neighbor analysis indicates the most extreme positive biases occur in AA, TT, GC and CG throughout all regions, but particularly in noncoding regions. Extraordinary numbers of oligomeric strings of (A)n, etc., are the further consequence of this bias. These and other characteristics point to the existence of inherent biases in neighbor frequencies levied during replication or repair, and which reflect, in turn, neighbor influences during mutation. The bias in codon usage noted by Grantham and others is seen here as due, in part, to the adaptation of coding sequences to this microenvironment through selection among synonymous codons so as to preserve inherent neighbor biases.
占大肠杆菌基因组10%的已发表DNA序列的平均(G + C)组成(51.0%)和标准差(±3.8%)与通过高分辨率熔解确定的主要总体分布非常吻合。虽然碱基和相邻碱基特征在基因组的所有区域差异很小且均匀,但发现序列的(G + C)含量在对应于编码区(53% G + C)和非编码区(46% G + C)的边界内呈分段变化;后者的方差比编码区大六倍。不同区域的方差对该区域的(G + C)含量表现出强烈的负相关性,反映了A - T和G - C碱基对分别是A - T和C - G对的优选相邻碱基的情况;偏差随着(G + C)含量的降低而增加。相邻碱基分析表明,在所有区域中,最极端的正偏差出现在AA、TT、GC和CG中,尤其是在非编码区。(A)n等寡聚串数量异常是这种偏差的进一步结果。这些以及其他特征表明在复制或修复过程中存在相邻碱基频率的固有偏差,而这又反过来反映了突变过程中的相邻碱基影响。格兰瑟姆等人指出的密码子使用偏差在这里部分归因于编码序列通过同义密码子之间的选择来适应这种微环境,从而保留固有的相邻碱基偏差。