Suppr超能文献

一种用于探索植物中功能性RNA基序的可解释RNA基础模型。

An interpretable RNA foundation model for exploring functional RNA motifs in plants.

作者信息

Yu Haopeng, Yang Heng, Sun Wenqing, Yan Zongyun, Yang Xiaofei, Zhang Huakun, Ding Yiliang, Li Ke

机构信息

Key Laboratory of Molecular Epigenetics of the Ministry of Education, Northeast Normal University, Changchun, China.

Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich, UK.

出版信息

Nat Mach Intell. 2024;6(12):1616-1625. doi: 10.1038/s42256-024-00946-z. Epub 2024 Dec 9.

Abstract

The complex 'language' of plant RNA encodes a vast array of biological regulatory elements that orchestrate crucial aspects of plant growth, development and adaptation to environmental stresses. Recent advancements in foundation models (FMs) have demonstrated their unprecedented potential to decipher complex 'language' in biology. In this study, we introduced PlantRNA-FM, a high-performance and interpretable RNA FM specifically designed for plants. PlantRNA-FM was pretrained on an extensive dataset, integrating RNA sequences and RNA structure information from 1,124 distinct plant species. PlantRNA-FM exhibits superior performance in plant-specific downstream tasks. PlantRNA-FM achieves an F1 score of 0.974 for genic region annotation, whereas the current best-performing model achieves 0.639. Our PlantRNA-FM is empowered by our interpretable framework that facilitates the identification of biologically functional RNA sequence and structure motifs, including both RNA secondary and tertiary structure motifs across transcriptomes. Through experimental validations, we revealed translation-associated RNA motifs in plants. Our PlantRNA-FM also highlighted the importance of the position information of these functional RNA motifs in genic regions. Taken together, our PlantRNA-FM facilitates the exploration of functional RNA motifs across the complexity of transcriptomes, empowering plant scientists with capabilities for programming RNA codes in plants.

摘要

植物RNA的复杂“语言”编码了大量生物调控元件,这些元件协调着植物生长、发育以及对环境胁迫适应的关键方面。基础模型(FMs)的最新进展已证明其在解读生物学复杂“语言”方面具有前所未有的潜力。在本研究中,我们引入了PlantRNA-FM,这是一种专门为植物设计的高性能且可解释的RNA基础模型。PlantRNA-FM在一个广泛的数据集上进行了预训练,该数据集整合了来自1124个不同植物物种的RNA序列和RNA结构信息。PlantRNA-FM在特定于植物的下游任务中表现出卓越性能。对于基因区域注释,PlantRNA-FM的F1分数达到0.974,而当前性能最佳的模型为0.639。我们的PlantRNA-FM由可解释框架赋能,该框架有助于识别生物功能RNA序列和结构基序,包括跨转录组的RNA二级和三级结构基序。通过实验验证,我们揭示了植物中与翻译相关的RNA基序。我们的PlantRNA-FM还突出了这些功能性RNA基序在基因区域中位置信息的重要性。综上所述,我们的PlantRNA-FM有助于在转录组的复杂性中探索功能性RNA基序,赋予植物科学家对植物RNA编码进行编程的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7be/11652376/c2837555f6e7/42256_2024_946_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验