Schuster P
Institut für Molekulare Biotechnologie e.V., Jena, Germany.
J Biotechnol. 1995 Jul 31;41(2-3):239-57. doi: 10.1016/0168-1656(94)00085-q.
The relation between RNA sequences and minimum free energy secondary structures is viewed as a mapping from sequence space into shape space. The properties of such mappings depend strongly on the ratios of the numbers of sequences and structures and, hence, substantial differences are observed between samples of structures derived from AUGC, pure AU or pure GC sequences. Statistical analysis of large samples is used to demonstrate that structures from AUGC sequences are much less sensitive to point mutations than those from sequences containing exclusively AU or GC. The frequency with which a structure is realized in sequence space is inversely proportional to some power c > 1 of the structure's frequency rank, thus following a (generalized) Zipf law. For long sequences the exponent approaches c = 1. An inverse folding algorithm is used to compute samples of sequences folding into the same secondary structure. These sequences are distributed randomly in sequence space. Common structures form extended neutral networks along which populations can migrate through the entire sequence space without changing structure. In this migration, moves of Hamming distance d = 1 and d = 2 are accepted in order to allow for base and base pair exchanges, respectively. Around any arbitrarily chosen sequence a ball that contains sequences folding into all common structures can be drawn. This ball has a diameter that is much smaller than the diameter of sequence space. Hence, only a small fraction of sequence space needs to be searched in order to find a given structure. The results derived from the mapping of sequences into structures are used to suggest a rationale for evolutionary searches on RNA structures: selection cycles with high and low mutation rates applied in alternation. Generalizations of the results to RNA 3-D structures and protein structures are discussed.
RNA序列与最小自由能二级结构之间的关系被视为从序列空间到形状空间的一种映射。这种映射的性质在很大程度上取决于序列和结构数量的比例,因此,在源自AUGC、纯AU或纯GC序列的结构样本之间观察到了显著差异。对大量样本的统计分析表明,与仅包含AU或GC的序列所形成的结构相比,AUGC序列所形成的结构对单点突变的敏感性要低得多。一种结构在序列空间中出现的频率与其频率排名的某个大于1的幂c成反比,从而遵循(广义的)齐普夫定律。对于长序列,指数趋近于c = 1。使用反向折叠算法来计算折叠成相同二级结构的序列样本。这些序列在序列空间中随机分布。常见结构形成扩展的中性网络,群体可以沿着该网络在整个序列空间中迁移而不改变结构。在这种迁移中,汉明距离d = 1和d = 2的移动分别被接受,以允许碱基和碱基对的交换。围绕任何任意选择的序列,可以绘制一个包含折叠成所有常见结构的序列的球。这个球的直径远小于序列空间的直径。因此,为了找到给定的结构,只需要搜索序列空间的一小部分。从序列到结构的映射得出的结果被用于为RNA结构的进化搜索提出一个基本原理:交替应用高突变率和低突变率的选择循环。还讨论了将这些结果推广到RNA三维结构和蛋白质结构的情况。