• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于RNA设计、机器学习及其他领域的综合数据集。

Comprehensive datasets for RNA design, machine learning, and beyond.

作者信息

Badura Jan, Rybarczyk Agnieszka, Zok Tomasz

机构信息

Institute of Computing Science, Poznan University of Technology, 60-965, Poznan, Poland.

Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704, Poznan, Poland.

出版信息

Sci Rep. 2025 Jul 1;15(1):21417. doi: 10.1038/s41598-025-07041-2.

DOI:10.1038/s41598-025-07041-2
PMID:40594473
Abstract

RNA molecules are essential in regulating biological processes such as gene expression, cellular differentiation, and development. Accurately predicting RNA secondary structures and designing sequences that fold into specific configurations remain significant challenges in computational biology, with far-reaching implications for medicine, synthetic biology, and biotechnology. While machine learning methodologies have been proposed to enhance prediction capabilities, they require high-quality training data. The lack of standardized benchmark datasets further hinders the development and evaluation of these tools. To address this, we created a comprehensive dataset of over 320 thousand instances from experimentally validated sources to establish a new community-wide benchmark for RNA design and modeling algorithms. Our dataset comprises numerous challenging structures for which state-of-the-art RNA inverse folders provide results of varying accuracy. We demonstrated the potential of the dataset by testing it with several popular open-source RNA design algorithms. Furthermore, we illustrated how our dataset can be used to train machine learning models that consider both RNA sequence and structure, potentially advancing RNA design and prediction capabilities.

摘要

RNA分子在调节生物过程(如基因表达、细胞分化和发育)中至关重要。准确预测RNA二级结构并设计能折叠成特定构型的序列,在计算生物学中仍然是重大挑战,对医学、合成生物学和生物技术有着深远影响。虽然已提出机器学习方法来增强预测能力,但它们需要高质量的训练数据。缺乏标准化的基准数据集进一步阻碍了这些工具的开发和评估。为解决这一问题,我们从经过实验验证的来源创建了一个包含超过32万个实例的综合数据集,为RNA设计和建模算法建立了一个新的全社区范围的基准。我们的数据集包含许多具有挑战性的结构,对于这些结构,最先进的RNA反向折叠器提供的结果准确性各异。我们通过用几种流行的开源RNA设计算法对其进行测试,展示了该数据集的潜力。此外,我们说明了如何使用我们的数据集来训练同时考虑RNA序列和结构的机器学习模型,这可能会推动RNA设计和预测能力的提升。

相似文献

1
Comprehensive datasets for RNA design, machine learning, and beyond.用于RNA设计、机器学习及其他领域的综合数据集。
Sci Rep. 2025 Jul 1;15(1):21417. doi: 10.1038/s41598-025-07041-2.
2
Nucleic Acid Nanocapsules as a New Platform to Deliver Therapeutic Nucleic Acids for Gene Regulation.核酸纳米胶囊作为用于基因调控的治疗性核酸递送新平台。
Acc Chem Res. 2025 Jul 1;58(13):1951-1962. doi: 10.1021/acs.accounts.5c00126. Epub 2025 Jun 9.
3
Deciphering Shared Gene Signatures and Immune Infiltration Characteristics Between Gestational Diabetes Mellitus and Preeclampsia by Integrated Bioinformatics Analysis and Machine Learning.通过综合生物信息学分析和机器学习破译妊娠期糖尿病和子痫前期之间共享的基因特征及免疫浸润特征
Reprod Sci. 2025 May 15. doi: 10.1007/s43032-025-01847-1.
4
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
5
Comparison of cellulose, modified cellulose and synthetic membranes in the haemodialysis of patients with end-stage renal disease.纤维素、改性纤维素和合成膜在终末期肾病患者血液透析中的比较。
Cochrane Database Syst Rev. 2001(3):CD003234. doi: 10.1002/14651858.CD003234.
6
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
7
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
8
Chemical Strategies to Modulate and Manipulate RNA Epigenetic Modifications.调控和操纵RNA表观遗传修饰的化学策略
Acc Chem Res. 2025 Jun 3;58(11):1727-1741. doi: 10.1021/acs.accounts.4c00844. Epub 2025 Mar 18.
9
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.
10
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

本文引用的文献

1
A Hyperbolic Discrete Diffusion 3D RNA Inverse Folding Model for Functional RNA Design.一种用于功能性RNA设计的双曲离散扩散3D RNA反向折叠模型。
J Chem Inf Model. 2025 Jul 14;65(13):6568-6584. doi: 10.1021/acs.jcim.5c00527. Epub 2025 Jun 12.
2
DesiRNA: structure-based design of RNA sequences with a replica exchange Monte Carlo approach.DesiRNA:基于复制交换蒙特卡罗方法的RNA序列结构设计
Nucleic Acids Res. 2025 Jan 11;53(2). doi: 10.1093/nar/gkae1306.
3
R3Design: deep tertiary structure-based RNA sequence design and beyond.
R3设计:基于三级结构的深度RNA序列设计及其他
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae682.
4
Minimal twister sister-like self-cleaving ribozymes in the human genome revealed by deep mutational scanning.通过深度突变扫描揭示的人类基因组中最小的类扭结姐妹自切割核酶
Elife. 2024 Dec 5;12:RP90254. doi: 10.7554/eLife.90254.
5
gRNAde: A Geometric Deep Learning Pipeline for 3D RNA Inverse Design.gRNAde:用于 3D RNA 反向设计的几何深度学习管道。
Methods Mol Biol. 2025;2847:121-135. doi: 10.1007/978-1-0716-4079-1_8.
6
Machine Learning for RNA Design: LEARNA.机器学习在 RNA 设计中的应用:LEARNA。
Methods Mol Biol. 2025;2847:63-93. doi: 10.1007/978-1-0716-4079-1_5.
7
RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models.RiboDiffusion:基于三级结构的 RNA 反折叠与生成式扩散模型。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i347-i356. doi: 10.1093/bioinformatics/btae259.
8
Accurate structure prediction of biomolecular interactions with AlphaFold 3.利用 AlphaFold 3 进行生物分子相互作用的精确结构预测。
Nature. 2024 Jun;630(8016):493-500. doi: 10.1038/s41586-024-07487-w. Epub 2024 May 8.
9
Causes, functions, and therapeutic possibilities of RNA secondary structure ensembles and alternative states.RNA 二级结构集合和替代状态的原因、功能和治疗可能性。
Cell Chem Biol. 2024 Jan 18;31(1):17-35. doi: 10.1016/j.chembiol.2023.12.010. Epub 2024 Jan 9.
10
Designing RNA switches for synthetic biology using inverse-RNA-folding.利用反向RNA折叠设计用于合成生物学的RNA开关。
Trends Biotechnol. 2024 May;42(5):517-521. doi: 10.1016/j.tibtech.2023.11.005. Epub 2023 Dec 1.