Yarus M, Folley L S
J Mol Biol. 1985 Apr 20;182(4):529-40. doi: 10.1016/0022-2836(85)90239-6.
The sequence environment of codons in structural genes has been investigated statistically, using computer methods. A set of Escherichia coli genes with abundant products was compared with a set having low gene product levels, in order to detect potential differences associated with expression. The results show striking non-randomness in the nucleotides occurring near codons. These effects are, unexpectedly, very much larger and more homogeneous among the genes with rare products. The intensity of effects in weakly expressed genes suggests that such non-random sequence environments decrease expression. In the weakly expressed set of genes, the 5' neighbor of a codon, and all positions of the 3' neighbor codon are biased. In the highly expressed genes, the first nucleotide of the next codon is a uniquely affected site. The distribution of non-randomness in weakly expressed genes suggests that sequence bias is primarily due to a constraint acting directly on the secondary or tertiary structure of the codon/anticodon. In highly expressed genes, the observed bias suggests an interaction between the codon/anticodon and a site outside the codon/anticodon. Much of the tendency to non-random near-neighbor sequences in weakly expressed genes can be ascribed to a correlation between nearby nucleotides and the wobble nucleotide of the codon, despite the fact that selection of such correlations will alter the amino acid sequence. The favored pattern, in genes expressed at low level, is R YYR or Y RRY. R indicates purine, Y indicates pyrimidine; the space is the boundary between codons. It seems likely that this preference for nearby sequences is the physical basis of the genetic context effect. Under this assumption such sequence biases will affect expression. On this basis, we predict new sites for contextual mutations which decrease expression, and suggest strategy for the design of messages having optimal translational activity.
利用计算机方法,对结构基因中密码子的序列环境进行了统计学研究。为了检测与表达相关的潜在差异,将一组具有丰富产物的大肠杆菌基因与一组基因产物水平较低的基因进行了比较。结果表明,密码子附近的核苷酸存在显著的非随机性。出乎意料的是,这些效应在产物稀少的基因中要大得多且更为一致。弱表达基因中效应的强度表明,这种非随机序列环境会降低表达。在弱表达基因集中,密码子的5'邻位以及3'邻位密码子的所有位置都存在偏向性。在高表达基因中,下一个密码子的第一个核苷酸是唯一受影响的位点。弱表达基因中非随机性的分布表明,序列偏向主要是由于直接作用于密码子/反密码子二级或三级结构的一种限制。在高表达基因中,观察到的偏向表明密码子/反密码子与密码子/反密码子之外的一个位点之间存在相互作用。尽管选择这种相关性会改变氨基酸序列,但弱表达基因中近邻序列非随机的趋势很大程度上可归因于附近核苷酸与密码子摆动核苷酸之间的相关性。在低水平表达的基因中,偏好的模式是R YYR或Y RRY。R表示嘌呤,Y表示嘧啶;空格是密码子之间的边界。这种对附近序列的偏好似乎很可能是遗传背景效应的物理基础。在这一假设下,这种序列偏向会影响表达。在此基础上,我们预测了降低表达的上下文突变新位点,并提出了设计具有最佳翻译活性信息的策略。