• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

StructmRNA:一种基于 BERT 的模型,具有双重水平和条件掩蔽,用于 mRNA 表示。

StructmRNA a BERT based model with dual level and conditional masking for mRNA representation.

机构信息

Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Ontario, Canada.

Department of Computer Engineering, University of Zanjan, Zanjan, Iran.

出版信息

Sci Rep. 2024 Oct 29;14(1):26043. doi: 10.1038/s41598-024-77172-5.

DOI:10.1038/s41598-024-77172-5
PMID:39472486
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11522565/
Abstract

In this study, we introduce StructmRNA, a new BERT-based model that was designed for the detailed analysis of mRNA sequences and structures. The success of DNABERT in understanding the intricate language of non-coding DNA with bidirectional encoder representations is extended to mRNA with StructmRNA. This new model uses a special dual-level masking technique that covers both sequence and structure, along with conditional masking. This enables StructmRNA to adeptly generate meaningful embeddings for mRNA sequences, even in the absence of explicit structural data, by capitalizing on the intricate sequence-structure correlations learned during extensive pre-training on vast datasets. Compared to well-known models like those in the Stanford OpenVaccine project, StructmRNA performs better in important tasks such as predicting RNA degradation. Thus, StructmRNA can inform better RNA-based treatments by predicting the secondary structures and biological functions of unseen mRNA sequences. The proficiency of this model is further confirmed by rigorous evaluations, revealing its unprecedented ability to generalize across various organisms and conditions, thereby marking a significant advance in the predictive analysis of mRNA for therapeutic design. With this work, we aim to set a new standard for mRNA analysis, contributing to the broader field of genomics and therapeutic development.

摘要

在这项研究中,我们引入了 StructmRNA,这是一个基于 BERT 的新型模型,旨在对 mRNA 序列和结构进行详细分析。DNABERT 在理解具有双向编码器表示的非编码 DNA 复杂语言方面取得的成功,被扩展到了 mRNA 上的 StructmRNA。这个新模型使用了一种特殊的双级掩蔽技术,覆盖了序列和结构,并结合了条件掩蔽。这使得 StructmRNA 能够在没有明确结构数据的情况下,通过利用在大规模数据集上进行的广泛预训练中学习到的复杂序列-结构相关性,巧妙地为 mRNA 序列生成有意义的嵌入。与斯坦福开放疫苗项目等知名模型相比,StructmRNA 在预测 RNA 降解等重要任务上表现更好。因此,StructmRNA 通过预测未见过的 mRNA 序列的二级结构和生物功能,可以为更好的基于 RNA 的治疗提供信息。通过严格的评估进一步证实了该模型的熟练程度,揭示了其在各种生物体和条件下跨类预测的前所未有的能力,从而在治疗设计的 mRNA 预测分析方面取得了重大进展。通过这项工作,我们旨在为 mRNA 分析设定新标准,为更广泛的基因组学和治疗开发领域做出贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/4da4b0a628e3/41598_2024_77172_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/e490130f4d66/41598_2024_77172_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/f26bc48eaec5/41598_2024_77172_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/afc906c56cd2/41598_2024_77172_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/0417d744a115/41598_2024_77172_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/5a26d96bdeae/41598_2024_77172_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/63f8bceb2125/41598_2024_77172_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/6429c9ef96c9/41598_2024_77172_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/4da4b0a628e3/41598_2024_77172_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/e490130f4d66/41598_2024_77172_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/f26bc48eaec5/41598_2024_77172_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/afc906c56cd2/41598_2024_77172_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/0417d744a115/41598_2024_77172_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/5a26d96bdeae/41598_2024_77172_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/63f8bceb2125/41598_2024_77172_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/6429c9ef96c9/41598_2024_77172_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/4da4b0a628e3/41598_2024_77172_Fig8_HTML.jpg

相似文献

1
StructmRNA a BERT based model with dual level and conditional masking for mRNA representation.StructmRNA:一种基于 BERT 的模型,具有双重水平和条件掩蔽,用于 mRNA 表示。
Sci Rep. 2024 Oct 29;14(1):26043. doi: 10.1038/s41598-024-77172-5.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
4
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Domain-Specific Pretraining of NorDeClin-Bidirectional Encoder Representations From Transformers for Code Prediction in Norwegian Clinical Texts: Model Development and Evaluation Study.用于挪威临床文本代码预测的基于变压器的挪威语临床双向编码器表示的特定领域预训练:模型开发与评估研究
JMIR AI. 2025 Aug 25;4:e66153. doi: 10.2196/66153.
7
Can we improve time to patency with vasoepididymostomy with an innovative epididymal occlusion stitch?我们能否通过一种创新的附睾结扎缝线来改善吻合术的通畅时间?
Int Braz J Urol. 2024 Jul-Aug;50(4):504-506. doi: 10.1590/S1677-5538.IBJU.2024.0222.
8
Detecting Redundant Health Survey Questions by Using Language-Agnostic Bidirectional Encoder Representations From Transformers Sentence Embedding: Algorithm Development Study.使用来自Transformer句子嵌入的语言无关双向编码器表示法检测冗余健康调查问题:算法开发研究
JMIR Med Inform. 2025 Jun 10;13:e71687. doi: 10.2196/71687.
9
Short-Term Memory Impairment短期记忆障碍
10
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.

引用本文的文献

1
annATAC: automatic cell type annotation for scATAC-seq data based on language model.annATAC:基于语言模型的单细胞染色质可及性测序数据自动细胞类型注释
BMC Biol. 2025 May 28;23(1):145. doi: 10.1186/s12915-025-02244-5.

本文引用的文献

1
Structures and functions of short argonautes.短银鲛的结构和功能。
RNA Biol. 2024 Jan;21(1):1-7. doi: 10.1080/15476286.2024.2380948. Epub 2024 Sep 1.
2
Targeting and engineering long non-coding RNAs for cancer therapy.靶向并改造长链非编码RNA用于癌症治疗。
Nat Rev Genet. 2024 Aug;25(8):578-595. doi: 10.1038/s41576-024-00693-2. Epub 2024 Feb 29.
3
New generative methods for single-cell transcriptome data in bulk RNA sequence deconvolution.批量 RNA 序列反卷积中单细胞转录组数据的新生成方法。
Sci Rep. 2024 Feb 20;14(1):4156. doi: 10.1038/s41598-024-54798-z.
4
GAN-based data augmentation for transcriptomics: survey and comparative assessment.基于 GAN 的转录组学数据增强:调查和比较评估。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i111-i120. doi: 10.1093/bioinformatics/btad239.
5
Exploring the Potential of GANs in Biological Sequence Analysis.探索生成对抗网络在生物序列分析中的潜力。
Biology (Basel). 2023 Jun 14;12(6):854. doi: 10.3390/biology12060854.
6
Emvirus: An embedding-based neural framework for human-virus protein-protein interactions prediction.Emvirus:一种基于嵌入的用于预测人类-病毒蛋白质-蛋白质相互作用的神经框架。
Biosaf Health. 2023 Jun;5(3):152-158. doi: 10.1016/j.bsheal.2023.04.003. Epub 2023 Apr 28.
7
Applications of transformer-based language models in bioinformatics: a survey.基于Transformer的语言模型在生物信息学中的应用:一项综述。
Bioinform Adv. 2023 Jan 11;3(1):vbad001. doi: 10.1093/bioadv/vbad001. eCollection 2023.
8
Deep learning models for predicting RNA degradation via dual crowdsourcing.通过双重众包预测RNA降解的深度学习模型
Nat Mach Intell. 2022;4(12):1174-1184. doi: 10.1038/s42256-022-00571-8. Epub 2022 Dec 14.
9
Plasma Extracellular Vesicle Long RNA in Diagnosis and Prediction in Small Cell Lung Cancer.血浆细胞外囊泡长链RNA在小细胞肺癌诊断与预测中的应用
Cancers (Basel). 2022 Nov 9;14(22):5493. doi: 10.3390/cancers14225493.
10
Advances and opportunities in RNA structure experimental determination and computational modeling.RNA 结构实验测定和计算建模的进展和机遇。
Nat Methods. 2022 Oct;19(10):1193-1207. doi: 10.1038/s41592-022-01623-y. Epub 2022 Oct 6.