Huang Hung-Chung, Nagaswamy Uma, Fox George E
Department of Biology and Biochemistry, Houston Science Center, Room 402, 3201 Cullen Blvd., University of Houston, Houston, TX 77204, USA.
RNA. 2005 Apr;11(4):412-23. doi: 10.1261/rna.7104605.
We have developed a computational approach for the comparison and classification of RNA loop structures. Hairpin or interior loops identified in atomic resolution RNA structures were intercompared by conformational matching. The root-mean-square deviation (RMSD) values between all pairs of RNA fragments of interest, even if from different molecules, are calculated. Subsequently, cluster analysis is performed on the resulting matrix of RMSD distances using the unweighted pair group method with arithmetic mean (UPGMA). The cluster analysis objectively reveals groups of folds that resemble one another. To demonstrate the utility of the approach, a comprehensive analysis of all the terminal hairpin tetraloops that have been observed in 15 RNA structures that have been determined by X-ray crystallography was undertaken. The method found major clusters corresponding to the well-known GNRA and UNCG types. In addition, two tetraloops with the unusual primary sequence UMAC (M is A or C) were successfully assigned to the GNRA cluster. Larger loop structures were also examined and the clustering results confirmed the occurrence of variations of the GNRA and UNCG tetraloops in these loops and provided a systematic means for locating them. Nineteen examples of larger loops that closely resemble either the GNRA or UNCG tetraloop were found in the large ribosomal RNAs. When the clustering approach was extended to include all structures in the SCOR database, novel relationships were detected including one between the ANYA motif and a less common folding of the GAAA tetraloop sequence.
我们开发了一种用于RNA环结构比较和分类的计算方法。通过构象匹配对原子分辨率RNA结构中识别出的发夹环或内环进行相互比较。计算所有感兴趣的RNA片段对之间的均方根偏差(RMSD)值,即使这些片段来自不同分子。随后,使用算术平均的非加权配对组方法(UPGMA)对所得的RMSD距离矩阵进行聚类分析。聚类分析客观地揭示了彼此相似的折叠组。为了证明该方法的实用性,我们对通过X射线晶体学确定的15个RNA结构中观察到的所有末端发夹四环进行了全面分析。该方法发现了与著名的GNRA和UNCG类型相对应的主要聚类。此外,两个具有不寻常一级序列UMAC(M为A或C)的四环成功地被归入GNRA聚类。还对更大的环结构进行了检查,聚类结果证实了这些环中存在GNRA和UNCG四环的变体,并提供了一种定位它们的系统方法。在大核糖体RNA中发现了19个与GNRA或UNCG四环非常相似的更大环的例子。当聚类方法扩展到包括SCOR数据库中的所有结构时,检测到了新的关系,包括ANYA基序与GAAA四环序列的一种不太常见折叠之间的关系。