Shepherd J C
Proc Natl Acad Sci U S A. 1981 Mar;78(3):1596-600. doi: 10.1073/pnas.78.3.1596.
The periodic variations obtained by correlating the relative positions of purines and pyrimidines (and of the four bases thymine, cytosine, adenine, and guanine) in a wide variety of genomes of wholly or partly known sequence suggest that there may be enough of an earlier comma-free coding system (i.e., only readable in one frame) still present to permit determination of the reading frame and approximate extent of the present protein coding stretches. The characteristics of these variations support the hypothesis that these primitive messages were formed of coding triplets having the form RNY (R = purine; Y = pyrimidine; and N = purine or pyrimidine). The base sequences and reading frames that have a minimal deviation from such a message are still good predictors of actual coding regions and reading frames in spite of the many mutations that have occurred since such a genetic code was last in use. In fact, the right frame for almost all the proteins in a number of viruses and various prokaryotes and eukaryotes is deduced purely from purine/pyrimidine information and not by using the normal start and stop signals.
通过关联众多全序列或部分序列已知的基因组中嘌呤与嘧啶(以及胸腺嘧啶、胞嘧啶、腺嘌呤和鸟嘌呤这四种碱基)的相对位置所获得的周期性变化表明,可能仍存在足够多的早期无逗号编码系统(即仅在一个读框中可读),以允许确定当前蛋白质编码片段的读框和大致范围。这些变化的特征支持了这样一种假说,即这些原始信息是由具有RNY形式的编码三联体构成的(R = 嘌呤;Y = 嘧啶;N = 嘌呤或嘧啶)。尽管自这种遗传密码上次使用以来发生了许多突变,但与这种信息偏差最小的碱基序列和读框仍是实际编码区域和读框的良好预测指标。事实上,许多病毒以及各种原核生物和真核生物中几乎所有蛋白质的正确读框纯粹是根据嘌呤/嘧啶信息推导出来的,而不是通过使用正常的起始和终止信号。