Miladi Milad, Raden Martin, Will Sebastian, Backofen Rolf
Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, Germany.
Theoretical Biochemistry Group (TBI), Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, Vienna, Austria.
Algorithms Mol Biol. 2020 Nov 13;15(1):19. doi: 10.1186/s13015-020-00179-w.
Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of [Formula: see text] in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments.
Here, we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains).
Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA.
RNA的同时比对与折叠(SA&F)是推断非编码RNA结构及其一般分析不可或缺的黄金标准。由桑科夫提出的原始算法,在全能量模型中以[公式:见正文]的复杂度精确解决了理论问题。在过去二十年中,人们提出了桑科夫算法的几种变体和改进方法,通过提出简化的能量模型或对预测比对施加限制来降低其极高的复杂度。
在此,我们介绍了一种桑科夫算法的新型变体,它将PMcomp的简化方法(即从全能量模型转变为更简单的基于碱基对的模型)与基于环的全能量模型的准确性相结合。我们的模型并非根据无条件碱基对概率估计伪能量,而是从条件碱基对概率计算能量,这使得能够准确捕捉遵循条件依赖性的结构概率。该模型催生了快速且高度准确的新型算法Pankov(受马尔可夫链启发的类似概率桑科夫的RNA同时比对与折叠算法)。
Pankov受益于排除不可靠碱基配对带来的加速,同时又不影响桑科夫算法基于环的自由能模型。我们表明,Pankov在折叠质量上优于其前身LocARNA和SPARSE,并且比LocARNA更快。