Hess S T, Blake J D, Blake R D
Department of Biochemistry, Microbiology and Molecular Biology, University of Maine, Orono 04469.
J Mol Biol. 1994 Mar 4;236(4):1022-33. doi: 10.1016/0022-2836(94)90009-4.
The pattern of 20,200 point substitutions in the 16 unique neighbor-pair environments has been determined from aligned gene/pseudogene sequences in the current database of human DNA sequences. Substitution rates, representing averages over those for different regions of the genome, are distributed over a 60-fold range with strong biases in particular neighbor-pair environments. The rates for substitutions involving the CG doublet are the most rapid overall, where changes of the C.G pair vary over a tenfold range depending on the type of substitution and the 5' neighbor-pair. In general, the rates are fastest in alternating purine-pyrimidine sequences and slowest in purine.pyrimidine tracts, suggesting that the frequencies of one or both key molecular misadventures that can occur during replication, dNTP misinsertion and transient misalignment, may be associated with structural alternations and flexibility of the backbone. By contrast, purine.pyrimidine tracts are less flexible, less prone to substitution, and therefore their proportions accumulate in sequences over time. Characteristic biases of the content and arrangement of oligonucleotide strings or tuples in all sequence elements, but particularly in non-coding regions, appear to be due to the pattern of different neighbor-dependent substitution rates. Computer simulations of numerous replicative cycles have been carried out with substitutions occurring on the same schedule found in this study for pseudogenes. Statistical analyses of tuple frequencies at periodic intervals during the simulation experiment indicate that sequences slowly change in lexical complexity toward a quasi-equilibrium state that corresponds to that for introns.
已从当前人类DNA序列数据库中比对的基因/假基因序列确定了16种独特邻对环境中的20200个点替换模式。替换率代表基因组不同区域的平均值,分布在60倍的范围内,在特定邻对环境中存在强烈偏差。涉及CG双峰的替换率总体上最快,其中C.G对的变化根据替换类型和5'邻对在10倍范围内变化。一般来说,替换率在嘌呤-嘧啶交替序列中最快,在嘌呤-嘧啶片段中最慢,这表明复制过程中可能发生的一种或两种关键分子错误事件(dNTP错插入和瞬时错配)的频率,可能与主链的结构交替和灵活性有关。相比之下,嘌呤-嘧啶片段灵活性较低,不易发生替换,因此它们的比例会随着时间在序列中积累。所有序列元件,尤其是非编码区中寡核苷酸串或元组的含量和排列的特征偏差,似乎是由于不同邻对依赖替换率的模式所致。已针对假基因按照本研究中发现的相同时间表进行替换,对众多复制周期进行了计算机模拟。模拟实验期间定期对元组频率进行的统计分析表明,序列的词汇复杂性朝着与内含子相对应的准平衡状态缓慢变化。