Sarver Michael, Zirbel Craig L, Stombaugh Jesse, Mokdad Ali, Leontis Neocles B
Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA.
J Math Biol. 2008 Jan;56(1-2):215-52. doi: 10.1007/s00285-007-0110-x. Epub 2007 Aug 11.
New methods are described for finding recurrent three-dimensional (3D) motifs in RNA atomic-resolution structures. Recurrent RNA 3D motifs are sets of RNA nucleotides with similar spatial arrangements. They can be local or composite. Local motifs comprise nucleotides that occur in the same hairpin or internal loop. Composite motifs comprise nucleotides belonging to three or more different RNA strand segments or molecules. We use a base-centered approach to construct efficient, yet exhaustive search procedures using geometric, symbolic, or mixed representations of RNA structure that we implement in a suite of MATLAB programs, "Find RNA 3D" (FR3D). The first modules of FR3D preprocess structure files to classify base-pair and -stacking interactions. Each base is represented geometrically by the position of its glycosidic nitrogen in 3D space and by the rotation matrix that describes its orientation with respect to a common frame. Base-pairing and base-stacking interactions are calculated from the base geometries and are represented symbolically according to the Leontis/Westhof basepairing classification, extended to include base-stacking. These data are stored and used to organize motif searches. For geometric searches, the user supplies the 3D structure of a query motif which FR3D uses to find and score geometrically similar candidate motifs, without regard to the sequential position of their nucleotides in the RNA chain or the identity of their bases. To score and rank candidate motifs, FR3D calculates a geometric discrepancy by rigidly rotating candidates to align optimally with the query motif and then comparing the relative orientations of the corresponding bases in the query and candidate motifs. Given the growing size of the RNA structure database, it is impossible to explicitly compute the discrepancy for all conceivable candidate motifs, even for motifs with less than ten nucleotides. The screening algorithm that we describe finds all candidate motifs whose geometric discrepancy with respect to the query motif falls below a user-specified cutoff discrepancy. This technique can be applied to RMSD searches. Candidate motifs identified geometrically may be further screened symbolically to identify those that contain particular basepair types or base-stacking arrangements or that conform to sequence continuity or nucleotide identity constraints. Purely symbolic searches for motifs containing user-defined sequence, continuity and interaction constraints have also been implemented. We demonstrate that FR3D finds all occurrences, both local and composite and with nucleotide substitutions, of sarcin/ricin and kink-turn motifs in the 23S and 5S ribosomal RNA 3D structures of the H. marismortui 50S ribosomal subunit and assigns the lowest discrepancy scores to bona fide examples of these motifs. The search algorithms have been optimized for speed to allow users to search the non-redundant RNA 3D structure database on a personal computer in a matter of minutes.
本文描述了在RNA原子分辨率结构中寻找重复三维(3D)基序的新方法。重复RNA 3D基序是具有相似空间排列的RNA核苷酸集合。它们可以是局部的或复合的。局部基序由出现在同一发夹或内环中的核苷酸组成。复合基序由属于三个或更多不同RNA链段或分子的核苷酸组成。我们采用一种以碱基为中心的方法,使用RNA结构的几何、符号或混合表示来构建高效且详尽的搜索程序,并在一套MATLAB程序“Find RNA 3D”(FR3D)中实现。FR3D的第一个模块对结构文件进行预处理,以对碱基对和堆积相互作用进行分类。每个碱基在几何上由其糖苷氮在三维空间中的位置以及描述其相对于公共框架方向的旋转矩阵表示。碱基对和碱基堆积相互作用根据碱基几何结构计算,并根据Leontis/Westhof碱基配对分类进行符号表示,扩展后包括碱基堆积。这些数据被存储并用于组织基序搜索。对于几何搜索,用户提供查询基序的3D结构,FR3D使用该结构来查找并对几何上相似的候选基序进行评分,而不考虑其核苷酸在RNA链中的顺序位置或其碱基的身份。为了对候选基序进行评分和排序,FR3D通过刚性旋转候选基序以使其与查询基序最佳对齐,然后比较查询基序和候选基序中相应碱基的相对方向来计算几何差异。鉴于RNA结构数据库规模的不断扩大,即使对于核苷酸少于十个的基序,也不可能明确计算所有可能的候选基序的差异。我们描述的筛选算法可以找到所有与查询基序的几何差异低于用户指定的截止差异的候选基序。该技术可应用于均方根偏差(RMSD)搜索。通过几何方法识别的候选基序可以进一步进行符号筛选,以识别那些包含特定碱基对类型或碱基堆积排列、或符合序列连续性或核苷酸身份约束的基序。还实现了对包含用户定义的序列、连续性和相互作用约束的基序的纯符号搜索。我们证明,FR3D在嗜盐栖热菌50S核糖体亚基的23S和5S核糖体RNA 3D结构中找到了肌动蛋白/蓖麻毒素和扭结转角基序的所有出现情况,包括局部和复合的以及有核苷酸替代的情况,并将最低差异分数分配给这些基序的真实示例。搜索算法已针对速度进行了优化,允许用户在个人计算机上在几分钟内搜索非冗余RNA 3D结构数据库。