Kirkpatrick Anna, Patton Kalen, Tetali Prasad, Mitchell Cassie
School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA.
School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Math Comput Appl. 2020 Dec;25(4). doi: 10.3390/mca25040067. Epub 2020 Oct 10.
Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.
核糖核酸(RNA)的二级结构和分支特性对于确定生物学中的功能分支很重要。虽然最近邻热力学模型(NNTM)的能量最小化通常用于识别此类特性(发夹数量、最大梯级距离等),但很难知道所得值是否落在给定能量函数的预期离散阈值范围内。本研究的目的是构建一个马尔可夫链,能够独立于特定核苷酸序列检查从NNTM能量函数最小化获得的RNA二级结构和分支特性的离散情况。平面树被作为RNA二级结构的模型进行研究,根据NNTM为每棵树分配能量,并在这些树上定义相应的吉布斯分布。通过平面树与2 - 莫兹金路径之间的双射,构建了一个收敛到吉布斯分布的马尔可夫链,并通过估计该链的谱隙建立了快速混合时间。谱隙估计是通过对链的一系列分解以及基于对戴克路径上其他链的已知混合时间结果得到的。所得算法可作为探索RNA分支结构的工具,特别是对于长序列,并用于检查分支结构对能量模型参数的依赖性。文中对所使用的数学技术进行了全面阐述,期望这些技术在生物信息学、计算生物学及其他扩展应用中被证明是有用的。