Pietrosanto Marco, Adinolfi Marta, Guarracino Andrea, Ferrè Fabrizio, Ausiello Gabriele, Vitale Ilio, Helmer-Citterich Manuela
Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy.
Department of Pharmacy and Biotechnology (FaBiT), University of Bologna Alma Mater, Via Belmeloro 6, 40126 Bologna, Italy.
NAR Genom Bioinform. 2021 Feb 15;3(1):lqab007. doi: 10.1093/nargab/lqab007. eCollection 2021 Mar.
Structural characterization of RNAs is a dynamic field, offering many modelling possibilities. RNA secondary structure models are usually characterized by an encoding that depicts structural information of the molecule through string representations or graphs. In this work, we provide a generalization of the BEAR encoding (a context-aware structural encoding we previously developed) by expanding the set of alignments used for the construction of substitution matrices and then applying it to secondary structure encodings ranging from fine-grained to more coarse-grained representations. We also introduce a re-interpretation of the Shannon Information applied on RNA alignments, proposing a new scoring metric, the Relative Information Gain (RIG). The RIG score is available for any position in an alignment, showing how different levels of detail encoded in the RNA representation can contribute differently to convey structural information. The approaches presented in this study can be used alongside state-of-the-art tools to synergistically gain insights into the structural elements that RNAs and RNA families are composed of. This additional information could potentially contribute to their improvement or increase the degree of confidence in the secondary structure of families and any set of aligned RNAs.
RNA的结构表征是一个充满活力的领域,提供了许多建模可能性。RNA二级结构模型通常通过一种编码来表征,这种编码通过字符串表示或图形来描绘分子的结构信息。在这项工作中,我们通过扩展用于构建替换矩阵的比对集,对BEAR编码(我们之前开发的一种上下文感知结构编码)进行了推广,然后将其应用于从细粒度到更粗粒度表示的二级结构编码。我们还引入了对应用于RNA比对的香农信息的重新解释,提出了一种新的评分指标,即相对信息增益(RIG)。RIG分数可用于比对中的任何位置,显示RNA表示中编码的不同细节水平如何以不同方式有助于传达结构信息。本研究中提出的方法可以与现有工具一起使用,以协同深入了解RNA和RNA家族所组成的结构元件。这些额外的信息可能有助于改进它们,或提高对家族二级结构以及任何一组比对RNA的置信度。