Gray Mateo, Will Sebastian, Jabbari Hosna
Department of Biomedical Engineering, University of Alberta, Street, Edmonton, T6G2R3, AB, Canada.
Department of Computer Science CNRS/LIX (UMR 7161), Institut Polytechnique de Paris, Street, Paris, 10587, France.
Algorithms Mol Biol. 2024 Mar 3;19(1):9. doi: 10.1186/s13015-024-00256-4.
Computational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their interactions. These methods find the structure with the minimum free energy (MFE) among exponentially many possible structures and have a restrictive time and space complexity ( time and space for pseudoknot-free structures) for longer RNA sequences. Furthermore, accurate free energy calculations, including dangle contributions can be difficult and costly to implement, particularly when optimizing for time and space requirements.
Here we introduce a fast and efficient sparsified MFE pseudoknot-free structure prediction algorithm, SparseRNAFolD, that utilizes an accurate energy model that accounts for dangle contributions. While the sparsification technique was previously employed to improve the time and space complexity of a pseudoknot-free structure prediction method with a realistic energy model, SparseMFEFold, it was not extended to include dangle contributions due to the complexity of computation. This may come at the cost of prediction accuracy. In this work, we compare three different sparsified implementations for dangle contributions and provide pros and cons of each method. As well, we compare our algorithm to LinearFold, a linear time and space algorithm, where we find that in practice, SparseRNAFolD has lower memory consumption across all lengths of sequence and a faster time for lengths up to 1000 bases.
Our SparseRNAFolD algorithm is an MFE-based algorithm that guarantees optimality of result and employs the most general energy model, including dangle contributions. We provide a basis for applying dangles to sparsified recursion in a pseudoknot-free model that has the potential to be extended to pseudoknots.
通过自由能最小化进行计算RNA二级结构预测对于分析结构RNA及其相互作用而言必不可少。这些方法在指数级数量的可能结构中找到具有最小自由能(MFE)的结构,并且对于较长的RNA序列具有严格的时间和空间复杂度(对于无假结结构为 时间和 空间)。此外,精确的自由能计算,包括悬垂贡献的计算,可能难以实现且成本高昂,特别是在针对时间和空间要求进行优化时。
在此,我们引入了一种快速高效的稀疏化MFE无假结结构预测算法SparseRNAFolD,该算法利用了一种考虑悬垂贡献的精确能量模型。虽然稀疏化技术先前已被用于提高具有实际能量模型的无假结结构预测方法SparseMFEFold的时间和空间复杂度,但由于计算复杂性,它未被扩展以包括悬垂贡献。这可能会以预测准确性为代价。在这项工作中,我们比较了三种不同的用于悬垂贡献的稀疏化实现,并给出了每种方法的优缺点。同样,我们将我们的算法与线性时间和空间算法LinearFold进行了比较,我们发现在实际应用中,对于所有长度的序列,SparseRNAFolD的内存消耗更低,对于长度达1000个碱基的序列,其运行时间更快。
我们的SparseRNAFolD算法是一种基于MFE的算法,可确保结果的最优性,并采用了最通用的能量模型,包括悬垂贡献。我们为在无假结模型中将悬垂应用于稀疏化递归提供了基础,该模型有可能扩展到假结。