Chevalet C, Michot B
Institut National de la Recherche Agronomique, Laboratoire de Génétique Cellulaire, Castanet Tolosan, France.
Comput Appl Biosci. 1992 Jun;8(3):215-25. doi: 10.1093/bioinformatics/8.3.215.
To access the functional informations carried by RNA molecules at the level of their secondary structure interactions, we propose a comparison method based on a tree edit algorithm which takes into account the tree structure of RNA foldings. Any secondary structure is translated into a tree involving all its elementary substructures; then a shorter condensed tree is built in which any unbranched helix interspersed with bulges and interior loops is taken as a single node. This method includes several parameters: a comparison matrix between structural units, gap penalties, and the scoring between nodes of the condensed trees. Their effects have been analysed using as a model a rapidly divergent domain of the large ribosomal RNA, for which structural variation during evolution is well known. This method allows one to recognize precisely, in large target molecules, definite substructures that present with the query molecules only a limited set of closely related secondary structure features; it is still efficient if intervening features, which can correspond to insertion/deletion of entire stem regions, separate such structural elements. When coupled with a hierarchical clustering algorithm, this method is suitable for classifying RNA molecules according to their secondary structure homologies.
为了在RNA分子二级结构相互作用层面获取其携带的功能信息,我们提出了一种基于树编辑算法的比较方法,该算法考虑了RNA折叠的树状结构。任何二级结构都被转化为一棵包含其所有基本子结构的树;然后构建一棵更短的压缩树,其中任何穿插着凸起和内环的无分支螺旋被视为一个单一节点。该方法包括几个参数:结构单元之间的比较矩阵、空位罚分以及压缩树节点之间的计分。我们以大核糖体RNA快速分化的区域为模型分析了它们的作用,该区域在进化过程中的结构变化是众所周知的。这种方法能够在大型目标分子中精确识别出特定的子结构,这些子结构与查询分子仅呈现出一组有限的紧密相关的二级结构特征;如果中间特征(可能对应于整个茎区的插入/缺失)将这些结构元件分隔开,该方法仍然有效。当与层次聚类算法结合使用时,这种方法适用于根据RNA分子的二级结构同源性对其进行分类。