Phillips G J, Arnold J, Ivarie R
Nucleic Acids Res. 1987 Mar 25;15(6):2627-38. doi: 10.1093/nar/15.6.2627.
As shown in the accompanying paper (5), the oligonucleotide composition of the E. coli genome is highly asymmetric for sequences up to 6 bp in length when ranked from highest to lowest abundance. We show here that this largely reflects codon usage because heavily used codons were found in the highly abundant oligomers whereas rarely used codons, with some exceptions, occurred in sequences in low abundance. Furthermore, linear regression analysis revealed a strong correlation between the frequencies of each trinucleotide and its usage as a codon. Dinucleotides are also not randomly distributed across each codon position and the dinucleotide composition of genes that are transcribed but not translated (rRNA and tRNA genes) was highly related to that seen in genes encoding polypeptides. However, 45 tetra-, 8 penta-, and 6 hexanucleotides were significantly over- or underabundant by Markov chain analysis and could not be accounted for by codon usage. Of these underrepresented sequences, many were palindromes, including the Dam methylation site.
如随附论文(5)所示,当按照丰度从高到低排列时,大肠杆菌基因组的寡核苷酸组成对于长度达6个碱基对的序列而言高度不对称。我们在此表明,这在很大程度上反映了密码子使用情况,因为在高度丰富的寡聚物中发现了使用频繁的密码子,而很少使用的密码子(有一些例外)出现在低丰度序列中。此外,线性回归分析揭示了每个三核苷酸的频率与其作为密码子的使用之间存在强烈相关性。双核苷酸在每个密码子位置上也不是随机分布的,并且转录但不翻译的基因(rRNA和tRNA基因)的双核苷酸组成与编码多肽的基因中的双核苷酸组成高度相关。然而,通过马尔可夫链分析,有45个四核苷酸、8个五核苷酸和6个六核苷酸明显过量或不足,并且不能用密码子使用情况来解释。在这些代表性不足的序列中,许多是回文序列,包括Dam甲基化位点。