Leontis N B, Stombaugh J, Westhof E
Chemistry Department and Center for Biomolecular Sciences, Overman Hall, Bowling Green State University, Bowling Green, OH 43403, USA.
Biochimie. 2002 Sep;84(9):961-73. doi: 10.1016/s0300-9084(02)01463-3.
The traditional way to infer RNA secondary structure involves an iterative process of alignment and evaluation of covariation statistics between all positions possibly involved in basepairing. Watson-Crick basepairs typically show covariations that score well when examples of two or more possible basepairs occur. This is not necessarily the case for non-Watson-Crick basepairing geometries. For example, for sheared (trans Hoogsteen/Sugar edge) pairs, one base is highly conserved (always A or mostly A with some C or U), while the other can vary (G or A and sometimes C and U as well). RNA motifs consist of ordered, stacked arrays of non-Watson-Crick basepairs that in the secondary structure representation form hairpin or internal loops, multi-stem junctions, and even pseudoknots. Although RNA motifs occur recurrently and contribute in a modular fashion to RNA architecture, it is usually not apparent which bases interact and whether it is by edge-to-edge H-bonding or solely by stacking interactions. Using a modular sequence-analysis approach, recurrent motifs related to the sarcin-ricin loop of 23S RNA and to loop E from 5S RNA were predicted in universally conserved regions of the large ribosomal RNAs (16S- and 23S-like) before the publication of high-resolution, atomic-level structures of representative examples of 16S and 23S rRNA molecules in their native contexts. This provides the opportunity to evaluate the predictive power of motif-level sequence analysis, with the goal of automating the process for predicting RNA motifs in genomic sequences. The process of inferring structure from sequence by constructing accurate alignments is a circular one. The crucial link that allows a productive iteration of motif modeling and realignment is the comparison of the sequence variations for each putative pair with the corresponding isostericity matrix to determine which basepairs are consistent both with the sequence and the geometrical data.
推断RNA二级结构的传统方法涉及一个迭代过程,即对所有可能参与碱基配对的位置之间的共变统计进行比对和评估。当出现两个或更多可能的碱基对示例时,沃森-克里克碱基对通常会显示出得分良好的共变情况。对于非沃森-克里克碱基配对几何结构来说,情况未必如此。例如,对于剪切(反式 hoogsteen/糖边缘)对,一个碱基高度保守(总是A或大多是A,还有一些C或U),而另一个碱基可以变化(G或A,有时还有C和U)。RNA基序由非沃森-克里克碱基对的有序堆叠阵列组成,在二级结构表示中形成发夹或内环、多茎连接,甚至假结。尽管RNA基序反复出现并以模块化方式对RNA结构起作用,但通常不清楚哪些碱基相互作用,以及是通过边对边氢键还是仅通过堆积相互作用。在16S和23S rRNA分子在其天然环境中的代表性示例的高分辨率原子水平结构发表之前,使用模块化序列分析方法,在大核糖体RNA(16S和23S样)的普遍保守区域中预测了与23S RNA的sarcin-ricin环和5S RNA的环E相关的反复出现的基序。这提供了评估基序水平序列分析预测能力的机会,目标是使基因组序列中RNA基序预测过程自动化。通过构建准确比对从序列推断结构的过程是一个循环过程。允许基序建模和重新比对进行有效迭代的关键环节是将每个假定对的序列变异与相应的等构性矩阵进行比较,以确定哪些碱基对在序列和几何数据方面都是一致的。