Suppr超能文献

RiNALMo:通用RNA语言模型在结构预测任务上能很好地泛化。

RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks.

作者信息

Penić Rafael Josip, Vlašić Tin, Huber Roland G, Wan Yue, Šikić Mile

机构信息

Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia.

Genome Institute of Singapore (GIS), Agency for Science Technology and Research (A*STAR), Singapore, Republic of Singapore.

出版信息

Nat Commun. 2025 Jul 1;16(1):5671. doi: 10.1038/s41467-025-60872-5.

Abstract

While RNA has recently been recognized as an interesting small-molecule drug target, many challenges remain to be addressed before we take full advantage of it. This emphasizes the necessity to improve our understanding of its structures and functions. Over the years, sequencing technologies have produced an enormous amount of unlabeled RNA data, which hides a huge potential. Motivated by the successes of protein language models, we introduce RiboNucleic Acid Language Model (RiNALMo) to unveil the hidden code of RNA. RiNALMo is the largest RNA language model to date, with 650M parameters pre-trained on 36M non-coding RNA sequences from several databases. It can extract hidden knowledge and capture the underlying structure information implicitly embedded within the RNA sequences. RiNALMo achieves state-of-the-art results on several downstream tasks. Notably, we show that its generalization capabilities overcome the inability of other deep learning methods for secondary structure prediction to generalize on unseen RNA families.

摘要

虽然RNA最近已被公认为是一个有趣的小分子药物靶点,但在我们充分利用它之前,仍有许多挑战有待解决。这凸显了提高我们对其结构和功能理解的必要性。多年来,测序技术产生了大量未标记的RNA数据,这些数据隐藏着巨大的潜力。受蛋白质语言模型成功的启发,我们引入了核糖核酸语言模型(RiNALMo)来揭示RNA的隐藏密码。RiNALMo是迄今为止最大的RNA语言模型,在来自多个数据库的3600万个非编码RNA序列上进行了6.5亿个参数的预训练。它可以提取隐藏知识,并捕捉隐含在RNA序列中的潜在结构信息。RiNALMo在多个下游任务上取得了领先成果。值得注意的是,我们表明它的泛化能力克服了其他深度学习方法在二级结构预测中无法对未见RNA家族进行泛化的问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验