• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过扩展训练数据改进RNA二级结构预测

Improving RNA Secondary Structure Prediction Through Expanded Training Data.

作者信息

Langeberg Conner J, Kim Taehan, Nagle Roma, Meredith Charlotte, Garuadapuri Dimple Amitha, Doudna Jennifer A, Cate Jamie H D

机构信息

Innovative Genomics Institute; University of California, Berkeley, CA, USA.

California Institute for Quantitative Biosciences (QB3), University of California, Berkeley, CA, USA.

出版信息

bioRxiv. 2025 May 3:2025.05.03.652028. doi: 10.1101/2025.05.03.652028.

DOI:10.1101/2025.05.03.652028
PMID:40654677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12247784/
Abstract

In recent years, deep learning has revolutionized protein structure prediction, achieving remarkable speed and accuracy. RNA structure prediction, however, has lagged behind. Although several methods have shown moderate success in predicting RNA secondary and tertiary structures, none have reached the accuracy observed with contemporary protein models. The lack of success of these RNA structure prediction models has been proposed to be due to limited high-quality structural information that can be used as training data. To probe this proposed limitation, we developed a large and diverse dataset comprising paired RNA sequences and their corresponding secondary structures. We assess the utility of this enhanced dataset by retraining two deep learning models, SincFold and MXfold2. We find that SincFold exhibited improved generalization to some previously unseen RNA families, enhancing its capability to predict accurate de novo RNA secondary structures. By contrast, retraining MXfold2 proved too computationally expensive for the large RNASSTR dataset and did not achieve high performance on the testing set. The RNASSTR dataset provides a substantial advance for RNA structure modeling, laying a strong foundation for the development of future RNA secondary structure prediction algorithms.

摘要

近年来,深度学习彻底改变了蛋白质结构预测,在速度和准确性方面取得了显著成就。然而,RNA结构预测却滞后了。尽管有几种方法在预测RNA二级和三级结构方面取得了一定成功,但没有一种能达到当代蛋白质模型所具有的准确性。这些RNA结构预测模型未能成功的原因被认为是可用于训练数据的高质量结构信息有限。为了探究这一假定的局限性,我们开发了一个庞大且多样的数据集,其中包含配对的RNA序列及其相应的二级结构。我们通过重新训练两个深度学习模型SincFold和MXfold2来评估这个增强数据集的效用。我们发现,SincFold对一些以前未见过的RNA家族表现出更好的泛化能力,并增强了其预测准确的从头RNA二级结构的能力。相比之下,重新训练MXfold2对于庞大的RNASSTR数据集来说计算成本过高,并且在测试集上没有取得高性能。RNASSTR数据集为RNA结构建模提供了实质性进展,并为未来RNA二级结构预测算法的开发奠定了坚实基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/08609cd45f31/nihpp-2025.05.03.652028v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/99be46228d4d/nihpp-2025.05.03.652028v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/127404449c87/nihpp-2025.05.03.652028v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/e4dcd7a7d253/nihpp-2025.05.03.652028v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/08609cd45f31/nihpp-2025.05.03.652028v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/99be46228d4d/nihpp-2025.05.03.652028v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/127404449c87/nihpp-2025.05.03.652028v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/e4dcd7a7d253/nihpp-2025.05.03.652028v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6972/12247784/08609cd45f31/nihpp-2025.05.03.652028v1-f0004.jpg

相似文献

1
Improving RNA Secondary Structure Prediction Through Expanded Training Data.通过扩展训练数据改进RNA二级结构预测
bioRxiv. 2025 May 3:2025.05.03.652028. doi: 10.1101/2025.05.03.652028.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Short-Term Memory Impairment短期记忆障碍
4
Anterior Approach Total Ankle Arthroplasty with Patient-Specific Cut Guides.使用患者特异性截骨导向器的前路全踝关节置换术。
JBJS Essent Surg Tech. 2025 Aug 15;15(3). doi: 10.2106/JBJS.ST.23.00027. eCollection 2025 Jul-Sep.
5
Management of urinary stones by experts in stone disease (ESD 2025).结石病专家对尿路结石的管理(2025年结石病专家共识)
Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
8
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.
9
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
10
Interventions for managing asthma in pregnancy.孕期哮喘管理的干预措施。
Cochrane Database Syst Rev. 2014 Oct 21;2014(10):CD010660. doi: 10.1002/14651858.CD010660.pub2.

本文引用的文献

1
Functional Relevance of CASP16 Nucleic Acid Predictions as Evaluated by Structure Providers.结构提供者评估的CASP16核酸预测的功能相关性。
Proteins. 2025 Sep 4. doi: 10.1002/prot.70043.
2
RNA language models predict mutations that improve RNA function.RNA语言模型可预测能改善RNA功能的突变。
Nat Commun. 2024 Dec 5;15(1):10627. doi: 10.1038/s41467-024-54812-y.
3
RNADiffFold: generative RNA secondary structure prediction using discrete diffusion models.RNADiffFold:使用离散扩散模型进行生成式 RNA 二级结构预测。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae618.
4
Accurate RNA 3D structure prediction using a language model-based deep learning approach.使用基于语言模型的深度学习方法进行准确的RNA三维结构预测。
Nat Methods. 2024 Dec;21(12):2287-2298. doi: 10.1038/s41592-024-02487-0. Epub 2024 Nov 21.
5
NCBI RefSeq: reference sequence standards through 25 years of curation and annotation.美国国立生物技术信息中心参考序列:历经25年整理与注释的参考序列标准。
Nucleic Acids Res. 2025 Jan 6;53(D1):D243-D257. doi: 10.1093/nar/gkae1038.
6
sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure.sincFold:RNA 二级结构中短程和远程相互作用的端到端学习。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae271.
7
Synthetic RNA biology.合成RNA生物学
RNA Biol. 2024 Jan;21(1):1-2. doi: 10.1080/15476286.2024.2335746. Epub 2024 Apr 14.
8
A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools.RNA 结构和相互作用预测工具的指南
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad421.
9
Harnessing synthetic biology for advancing RNA therapeutics and vaccine design.利用合成生物学推进 RNA 治疗学和疫苗设计。
NPJ Syst Biol Appl. 2023 Nov 30;9(1):60. doi: 10.1038/s41540-023-00323-3.
10
Engineering CRISPR guide RNAs for programmable RNA sensors.工程化 CRISPR 引导 RNA 用于可编程 RNA 传感器。
Biochem Soc Trans. 2023 Dec 20;51(6):2061-2070. doi: 10.1042/BST20221486.