Bossanyi Marc-André, Carpentier Valentin, Glouzon Jean-Pierre S, Ouangraoua Aïda, Anselmetti Yoann
CoBIUS lab, Department of Computer Science, University of Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, QC J1K 2R1, Canada.
NAR Genom Bioinform. 2020 Oct 27;2(4):lqaa086. doi: 10.1093/nargab/lqaa086. eCollection 2020 Dec.
Predicting RNA structure is crucial for understanding RNA's mechanism of action. Comparative approaches for the prediction of RNA structures can be classified into four main strategies. The three first-align-and-fold, align-then-fold and fold-then-align-exploit multiple sequence alignments to improve the accuracy of conserved RNA-structure prediction. Align-and-fold methods perform generally better, but are also typically slower than the other alignment-based methods. The fourth strategy-alignment-free-consists in predicting the conserved RNA structure without relying on sequence alignment. This strategy has the advantage of being the faster, while predicting accurate structures through the use of latent representations of the candidate structures for each sequence. This paper presents aliFreeFoldMulti, an extension of the aliFreeFold algorithm. This algorithm predicts a representative secondary structure of multiple RNA homologs by using a vector representation of their suboptimal structures. aliFreeFoldMulti improves on aliFreeFold by additionally computing the conserved structure for each sequence. aliFreeFoldMulti is assessed by comparing its prediction performance and time efficiency with a set of leading RNA-structure prediction methods. aliFreeFoldMulti has the lowest computing times and the highest maximum accuracy scores. It achieves comparable average structure prediction accuracy as other methods, except TurboFoldII which is the best in terms of average accuracy but with the highest computing times. We present aliFreeFoldMulti as an illustration of the potential of alignment-free approaches to provide fast and accurate RNA-structure prediction methods.
预测RNA结构对于理解RNA的作用机制至关重要。用于预测RNA结构的比较方法可分为四种主要策略。前三种——先比对再折叠、先比对后折叠和先折叠后比对——利用多序列比对来提高保守RNA结构预测的准确性。先比对再折叠的方法通常表现更好,但通常也比其他基于比对的方法更慢。第四种策略——无比对——在于不依赖序列比对来预测保守RNA结构。该策略的优点是速度更快,同时通过使用每个序列候选结构的潜在表示来预测准确的结构。本文介绍了aliFreeFoldMulti,它是aliFreeFold算法的扩展。该算法通过使用次优结构的向量表示来预测多个RNA同源物的代表性二级结构。aliFreeFoldMulti通过额外计算每个序列的保守结构对aliFreeFold进行了改进。通过将其预测性能和时间效率与一组领先的RNA结构预测方法进行比较来评估aliFreeFoldMulti。aliFreeFoldMulti具有最短的计算时间和最高的最大准确度得分。它实现了与其他方法相当的平均结构预测准确度,除了TurboFoldII,TurboFoldII在平均准确度方面是最好的,但计算时间最长。我们展示aliFreeFoldMulti以说明无比对方法在提供快速准确的RNA结构预测方法方面的潜力。