Gray Mateo, Trinity Luke, Stege Ulrike, Ponty Yann, Will Sebastian, Jabbari Hosna
Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta T6G 1H9, Canada.
Department of Computer Science, University of Victoria, Victoria, British Columbia V8P 5C2, Canada.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae748.
Biologically relevant RNA secondary structures are routinely predicted by efficient dynamic programming algorithms that minimize their free energy. Starting from such algorithms, one can devise partition function algorithms, which enable stochastic perspectives on RNA structure ensembles. As the most prominent example, McCaskill's partition function algorithm is derived from pseudoknot-free energy minimization. While this algorithm became hugely successful for the analysis of pseudoknot-free RNA structure ensembles, as of yet there exists only one pseudoknotted partition function implementation, which covers only simple pseudoknots and comes with a borderline-prohibitive complexity of O(n5) in the RNA length n.
Here, we develop a partition function algorithm corresponding to the hierarchical pseudoknot prediction of HFold, which performs exact optimization in a realistic pseudoknot energy model. In consequence, our algorithm CParty carries over HFold's advantages over classical pseudoknot prediction in characterizing the Boltzmann ensemble at equilibrium. Given an RNA sequence S and a pseudoknot-free structure G, CParty computes the partition function over all possibly pseudoknotted density-2 structures G∪G' of S that extend the fixed G by a disjoint pseudoknot-free structure G'. Thus, CParty follows the common hypothesis of hierarchical pseudoknot formation, where pseudoknots form as tertiary contacts only after a first pseudoknot-free "core" G and we call the computed partition function hierarchically constrained (by G). Like HFold, the dynamic programming algorithm CParty is very efficient, achieving the low complexity of the pseudoknot-free algorithm, i.e. cubic time and quadratic space. Finally, by computing pseudoknotted ensemble energies, we unveil kinetics features of a therapeutic target in SARS-CoV-2.
CParty is available at https://github.com/HosnaJabbari/CParty.
生物学上相关的RNA二级结构通常由高效的动态规划算法预测,这些算法将其自由能最小化。从这类算法出发,可以设计出配分函数算法,从而能够从随机角度看待RNA结构集合。最突出的例子是,麦卡斯基尔的配分函数算法源自无假结自由能最小化。虽然该算法在分析无假结RNA结构集合方面取得了巨大成功,但截至目前,只有一种假结配分函数实现,它仅涵盖简单假结,并且在RNA长度n上具有高达O(n5)的复杂度,几乎令人望而却步。
在此,我们开发了一种与HFold的分层假结预测相对应的配分函数算法,该算法在实际的假结能量模型中进行精确优化。因此,我们的CParty算法继承了HFold在表征平衡态玻尔兹曼集合方面相对于经典假结预测的优势。给定一个RNA序列S和一个无假结结构G,CParty会计算S的所有可能的假结密度为2的结构G∪G'上的配分函数,其中G'是一个与G不相交的无假结结构,用于扩展固定的G。因此,CParty遵循分层假结形成的常见假设,即假结仅在第一个无假结的“核心”G形成之后才作为三级接触形成,我们将计算出的配分函数称为(由G)分层约束的。与HFold一样,动态规划算法CParty非常高效,实现了无假结算法的低复杂度,即立方时间和二次空间。最后,通过计算假结集合能量,我们揭示了SARS-CoV-2中一个治疗靶点的动力学特征。