Ohno S
Beckman Research Institute of the City of Hope, Durate, CA 91010.
Proc Natl Acad Sci U S A. 1988 Dec;85(24):9630-4. doi: 10.1073/pnas.85.24.9630.
Each coding sequence is a finite resource as to the number and composition of four bases. Accordingly, the excessive recurrence of one base oligomer entails the noticeable underrepresentation by the other, so that if the former is the same in most, if not all, of the coding sequences, the latter too must necessarily be the same in all. Indeed, a previous series of studies on 20-odd divergent coding sequences established CTG as one of the most frequently recurring base trimers (if not the most frequent), and this excess was compensated by the underrepresentation by CG and TA dimer-containing base trimers. In this study, I have analyzed three additional coding sequences and reanalyzed one previously studied coding sequence. These four, derived from man, a plant, and a fish, were of variously lopsided base compositions that were not at all conducive to high recurrences of either CT dimer or CT and TG. Yet, the excess of CT and TG dimers accompanied by complementary deficiency of CG and TA dimers emerged as the common rule. Thus, I propose the above as the universal rule of coding sequence construction. The underrepresentation by CG and TA dimers within coding sequences explains why regulatory signals in intergenic spacers are of two kinds: one, TA dimer rich; and the other, CG dimer rich.
就四个碱基的数量和组成而言,每个编码序列都是一种有限的资源。因此,一种碱基寡聚物的过度重复必然导致另一种碱基寡聚物的显著减少,所以如果前者在大多数(如果不是全部)编码序列中相同,那么后者在所有编码序列中也必然相同。事实上,之前对20多个不同编码序列的一系列研究确定CTG是最常出现的碱基三聚体之一(如果不是最频繁的),而这种过量由含CG和TA二聚体的碱基三聚体的减少来补偿。在本研究中,我分析了另外三个编码序列,并重新分析了一个之前研究过的编码序列。这四个序列分别来自人类、一种植物和一种鱼类,其碱基组成各不相同且不均衡,根本不利于CT二聚体或CT和TG的高频率出现。然而,CT和TG二聚体过量以及CG和TA二聚体互补性缺乏成为了共同规律。因此,我提出上述内容作为编码序列构建的通用规则。编码序列中CG和TA二聚体的减少解释了基因间隔区的调控信号为何有两种:一种富含TA二聚体;另一种富含CG二聚体。