Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany.
Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
Bioinformatics. 2017 Jul 15;33(14):2089-2096. doi: 10.1093/bioinformatics/btx114.
Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account.
Here, we present RNAscClust , the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.
RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust .
gorodkin@rth.dk or backofen@informatik.uni-freiburg.de.
Supplementary data are available at Bioinformatics online.
对具有共同二级结构的 RNA 序列进行聚类是研究 RNA 功能的重要步骤。尽管结构 RNA 比对策略通常可以为同源结构的 RNA 识别共同结构,但聚类旨在根据结构相似性对旁系同源 RNA 进行分组。然而,现有的聚类旁系同源 RNA 的方法并没有考虑从同源序列的结构保守性中获得的补偿碱基对变化。
在这里,我们提出了 RNAscClust,这是一种新算法的实现,用于对一组考虑其各自结构保守性的结构化 RNA 进行聚类。对于一组 RNA 序列的多个结构比对,每个比对都包含一个旁系同源序列,这些序列都包含在其同源序列的结构比对中,RNAscClust 使用保守碱基对作为折叠的先验信息,为每个序列计算最小自由能结构。然后使用基于图核的策略对旁系同源物进行聚类,该策略可以识别共同的结构特征。我们表明,聚类准确性明显受益于比对中补偿碱基对变化程度的增加。
RNAscClust 可在 http://www.bioinf.uni-freiburg.de/Software/RNAscClust 获得。
gorodkin@rth.dk 或 backofen@informatik.uni-freiburg.de。
补充数据可在生物信息学在线获得。