Smith T F, Waterman M S, Sadler J R
Nucleic Acids Res. 1983 Apr 11;11(7):2205-20. doi: 10.1093/nar/11.7.2205.
It has long been recognized that various genome classes were distinguishable on the basis of base composition and nearest neighbor frequencies. In addition Grantham et al. (8) have recently presented evidence that these distinctions are preserved at the level of codon usage. As discussed in this report it is now clear that these and related statistics can uniquely characterize the various functional domains of the genome. In particular peptide coding, intervening segments, structural RNA coding and mitochondrial domains of the vertebrate genome are uniquely characterizable. The statistical measures not only reflect understood functional differences among these domains but suggest others. The ability of these simple statistics of nucleic acid sequences to reflect so much of the encoded complex pattern information and/or effects of selective constraints is somewhat surprising. Here, we investigated the statistical measures most distinctive of the various domains and then linked them to our current understandings in so far as possible.
长期以来,人们已经认识到,根据碱基组成和最近邻频率可以区分各种基因组类别。此外,格兰瑟姆等人(8)最近提供的证据表明,这些差异在密码子使用水平上得以保留。正如本报告中所讨论的,现在很清楚,这些以及相关统计数据可以唯一地表征基因组的各种功能域。特别是脊椎动物基因组的肽编码、间隔区段、结构RNA编码和线粒体域具有独特的表征性。这些统计量不仅反映了这些域之间已知的功能差异,还暗示了其他差异。核酸序列的这些简单统计量能够反映如此多的编码复杂模式信息和/或选择限制的影响,这有点令人惊讶。在这里,我们研究了各种域最具特色的统计量,并尽可能将它们与我们目前的认识联系起来。