Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada.
Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada. Electronic address: https://twitter.com/francois_major.
J Mol Biol. 2023 Aug 1;435(15):168181. doi: 10.1016/j.jmb.2023.168181. Epub 2023 Jun 9.
Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/.
识别功能相关 RNA 序列(家族)的常见结构元素通常基于序列比对,这通常受到人为偏见的影响,并且可能不准确。由此产生的协方差模型 (CM) 为每个碱基与另一个碱基协同变化提供了概率,这允许支持双链区域和可能的假结的进化形成。由于 RNA 的动态性质,其共存的替代折叠可能导致 CM 潜在地省略基序。为了克服这一限制,我们提出了 D-ORB,这是一种算法系统,用于在二级构象景观中识别家族中与无关序列相比过度表示的基序。这些算法被捆绑到一个易于使用的网站中,允许用户提交一个家族,并可选地提供无关序列。D-ORB 根据过度表示的基序生成非假结二级结构,以及深度神经网络分类器和两棵决策树。当用于对 Rfam 家族进行建模时,D-ORB 适合相应 Rfam 结构中的过度表示基序;已经对一百多个 Rfam 家族进行了建模。D-ORB 背后的统计方法推导出 RNA 家族的结构组成,使其成为分析和建模 RNA 的有价值工具。其易于使用的界面和先进的算法使其成为研究 RNA 结构的研究人员的重要资源。D-ORB 可在 https://d-orb.major.iric.ca/ 获得。