Suppr超能文献

通过结构感知深度学习预测 RNA 序列-结构可能性。

Predicting RNA sequence-structure likelihood via structure-aware deep learning.

机构信息

School of Computing and Augmented Intelligence, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA.

ASU-Mayo Center for Innovative Imaging, Arizona State University, 699 S Mill Ave, Tempe, AZ, 85281, USA.

出版信息

BMC Bioinformatics. 2024 Sep 30;25(1):316. doi: 10.1186/s12859-024-05916-1.

Abstract

BACKGROUND

The active functionalities of RNA are recognized to be heavily dependent on the structure and sequence. Therefore, a model that can accurately evaluate a design by giving RNA sequence-structure pairs would be a valuable tool for many researchers. Machine learning methods have been explored to develop such tools, showing promising results. However, two key issues remain. Firstly, the performance of machine learning models is affected by the features used to characterize RNA. Currently, there is no consensus on which features are the most effective for characterizing RNA sequence-structure pairs. Secondly, most existing machine learning methods extract features describing entire RNA molecule. We argue that it is essential to define additional features that characterize nucleotides and specific sections of RNA structure to enhance the overall efficacy of the RNA design process.

RESULTS

We develop two deep learning models for evaluating RNA sequence-secondary structure pairs. The first model, NU-ResNet, uses a convolutional neural network architecture that solves the aforementioned problems by explicitly encoding RNA sequence-structure information into a 3D matrix. Building upon NU-ResNet, our second model, NUMO-ResNet, incorporates additional information derived from the characterizations of RNA, specifically the 2D folding motifs. In this work, we introduce an automated method to extract these motifs based on fundamental secondary structure descriptions. We evaluate the performance of both models on an independent testing dataset. Our proposed models outperform the models from literatures in this independent testing dataset. To assess the robustness of our models, we conduct 10-fold cross validation. To evaluate the generalization ability of NU-ResNet and NUMO-ResNet across different RNA families, we train and test our proposed models in different RNA families. Our proposed models show superior performance compared to the models from literatures when being tested across different independent RNA families.

CONCLUSIONS

In this study, we propose two deep learning models, NU-ResNet and NUMO-ResNet, to evaluate RNA sequence-secondary structure pairs. These two models expand the field of data-driven approaches for learning RNA. Furthermore, these two models provide the new method to encode RNA sequence-secondary structure pairs.

摘要

背景

人们认识到,RNA 的活性功能在很大程度上依赖于结构和序列。因此,能够通过提供 RNA 序列-结构对来准确评估设计的模型将是许多研究人员的宝贵工具。已经探索了机器学习方法来开发此类工具,显示出有希望的结果。然而,仍然存在两个关键问题。首先,机器学习模型的性能受到用于描述 RNA 的特征的影响。目前,对于哪些特征最有效地描述 RNA 序列-结构对,还没有共识。其次,大多数现有的机器学习方法提取描述整个 RNA 分子的特征。我们认为,定义描述 RNA 核苷酸和特定结构部分的额外特征对于提高 RNA 设计过程的整体效果至关重要。

结果

我们开发了两种用于评估 RNA 序列-二级结构对的深度学习模型。第一个模型 NU-ResNet 使用卷积神经网络架构,通过将 RNA 序列-结构信息明确地编码到 3D 矩阵中来解决上述问题。在 NU-ResNet 的基础上,我们的第二个模型 NUMO-ResNet 结合了来自 RNA 特征的附加信息,特别是 2D 折叠基序。在这项工作中,我们引入了一种基于基本二级结构描述自动提取这些基序的方法。我们在独立测试数据集上评估了这两种模型的性能。我们提出的模型在这个独立测试数据集上优于文献中的模型。为了评估我们的模型的稳健性,我们进行了 10 倍交叉验证。为了评估 NU-ResNet 和 NUMO-ResNet 在不同 RNA 家族中的泛化能力,我们在不同的 RNA 家族中训练和测试我们提出的模型。与文献中的模型相比,当在不同的独立 RNA 家族中进行测试时,我们提出的模型表现出优越的性能。

结论

在这项研究中,我们提出了两种深度学习模型,NU-ResNet 和 NUMO-ResNet,用于评估 RNA 序列-二级结构对。这两种模型扩展了基于数据的学习 RNA 的方法领域。此外,这两种模型提供了编码 RNA 序列-二级结构对的新方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b397/11443715/c675db75f817/12859_2024_5916_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验