Suppr超能文献

StructmRNA:一种基于 BERT 的模型,具有双重水平和条件掩蔽,用于 mRNA 表示。

StructmRNA a BERT based model with dual level and conditional masking for mRNA representation.

机构信息

Information Retrieval and Knowledge Management Research Lab, York University, Toronto, Ontario, Canada.

Department of Computer Engineering, University of Zanjan, Zanjan, Iran.

出版信息

Sci Rep. 2024 Oct 29;14(1):26043. doi: 10.1038/s41598-024-77172-5.

Abstract

In this study, we introduce StructmRNA, a new BERT-based model that was designed for the detailed analysis of mRNA sequences and structures. The success of DNABERT in understanding the intricate language of non-coding DNA with bidirectional encoder representations is extended to mRNA with StructmRNA. This new model uses a special dual-level masking technique that covers both sequence and structure, along with conditional masking. This enables StructmRNA to adeptly generate meaningful embeddings for mRNA sequences, even in the absence of explicit structural data, by capitalizing on the intricate sequence-structure correlations learned during extensive pre-training on vast datasets. Compared to well-known models like those in the Stanford OpenVaccine project, StructmRNA performs better in important tasks such as predicting RNA degradation. Thus, StructmRNA can inform better RNA-based treatments by predicting the secondary structures and biological functions of unseen mRNA sequences. The proficiency of this model is further confirmed by rigorous evaluations, revealing its unprecedented ability to generalize across various organisms and conditions, thereby marking a significant advance in the predictive analysis of mRNA for therapeutic design. With this work, we aim to set a new standard for mRNA analysis, contributing to the broader field of genomics and therapeutic development.

摘要

在这项研究中,我们引入了 StructmRNA,这是一个基于 BERT 的新型模型,旨在对 mRNA 序列和结构进行详细分析。DNABERT 在理解具有双向编码器表示的非编码 DNA 复杂语言方面取得的成功,被扩展到了 mRNA 上的 StructmRNA。这个新模型使用了一种特殊的双级掩蔽技术,覆盖了序列和结构,并结合了条件掩蔽。这使得 StructmRNA 能够在没有明确结构数据的情况下,通过利用在大规模数据集上进行的广泛预训练中学习到的复杂序列-结构相关性,巧妙地为 mRNA 序列生成有意义的嵌入。与斯坦福开放疫苗项目等知名模型相比,StructmRNA 在预测 RNA 降解等重要任务上表现更好。因此,StructmRNA 通过预测未见过的 mRNA 序列的二级结构和生物功能,可以为更好的基于 RNA 的治疗提供信息。通过严格的评估进一步证实了该模型的熟练程度,揭示了其在各种生物体和条件下跨类预测的前所未有的能力,从而在治疗设计的 mRNA 预测分析方面取得了重大进展。通过这项工作,我们旨在为 mRNA 分析设定新标准,为更广泛的基因组学和治疗开发领域做出贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac8b/11522565/e490130f4d66/41598_2024_77172_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验