Suppr超能文献

植物分子生物学中的基础模型:进展、挑战与未来方向。

Foundation models in plant molecular biology: advances, challenges, and future directions.

作者信息

Xu Feng, Wu Tianhao, Cheng Qian, Wang Xiangfeng, Yan Jun

机构信息

Frontiers Science Center for Molecular Design Breeding, State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China.

出版信息

Front Plant Sci. 2025 Jun 3;16:1611992. doi: 10.3389/fpls.2025.1611992. eCollection 2025.

Abstract

A foundation model (FM) is a neural network trained on large-scale data using unsupervised or self-supervised learning, capable of adapting to a wide range of downstream tasks. This review provides a comprehensive overview of FMs in plant molecular biology, emphasizing recent advances and future directions. It begins by tracing the evolution of biological FMs across the DNA, RNA, protein, and single-cell levels, from tools inspired by natural language processing (NLP) to transformative models for decoding complex biological sequences. The review then focuses on plant-specific FMs such as GPN, AgroNT, PDLLMs, PlantCaduceus, and PlantRNA-FM, which address challenges that are widespread among plant genomes, including polyploidy, high repetitive sequence content, and environment-responsive regulatory elements, alongside universal FMs like GENERator and Evo 2, which leverage extensive cross-species training data for sequence design and prediction of mutation effects. Key opportunities and challenges in plant molecular biology FM development are further outlined, such as data heterogeneity, biologically informed architectures, cross-species generalization, and computational efficiency. Future research should prioritize improvements in model generalization, multi-modal data integration, and computational optimization to overcome existing limitations and unlock the potential of FMs in plant science. This review serves as an essential resource for plant molecular biologists and offers a clear snapshot of the current state and future potential of FMs in the field.

摘要

基础模型(FM)是一种通过无监督或自监督学习在大规模数据上训练的神经网络,能够适应广泛的下游任务。本综述全面概述了基础模型在植物分子生物学中的应用,重点介绍了近期进展和未来方向。首先追溯了生物基础模型在DNA、RNA、蛋白质和单细胞水平上的发展历程,从受自然语言处理(NLP)启发的工具到用于解码复杂生物序列的变革性模型。接着,综述聚焦于植物特异性基础模型,如GPN、AgroNT、PDLLMs、PlantCaduceus和PlantRNA-FM,这些模型解决了植物基因组中普遍存在的挑战,包括多倍体、高重复序列含量和环境响应调控元件,同时也介绍了像GENERator和Evo 2这样的通用基础模型,它们利用广泛的跨物种训练数据进行序列设计和突变效应预测。进一步概述了植物分子生物学基础模型开发中的关键机遇和挑战,如数据异质性、生物信息架构、跨物种泛化和计算效率。未来的研究应优先改进模型泛化、多模态数据整合和计算优化,以克服现有局限性并释放基础模型在植物科学中的潜力。本综述是植物分子生物学家的重要资源,清晰呈现了该领域基础模型的当前状态和未来潜力。

相似文献

1
Foundation models in plant molecular biology: advances, challenges, and future directions.
Front Plant Sci. 2025 Jun 3;16:1611992. doi: 10.3389/fpls.2025.1611992. eCollection 2025.
2
Wood Waste Valorization and Classification Approaches: A systematic review.
Open Res Eur. 2025 May 6;5:5. doi: 10.12688/openreseurope.18862.1. eCollection 2025.
4
Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.
Comput Biol Med. 2025 Aug;194:110501. doi: 10.1016/j.compbiomed.2025.110501. Epub 2025 Jun 9.
7
Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.
Cochrane Database Syst Rev. 2025 Jun 20;6:CD015794. doi: 10.1002/14651858.CD015794.pub2.
8
Introducing the dataset for measuring centrality for sustainability-A case study of Pecinci municipality, Serbia.
Data Brief. 2025 May 27;61:111714. doi: 10.1016/j.dib.2025.111714. eCollection 2025 Aug.
9
Interventions for central serous chorioretinopathy: a network meta-analysis.
Cochrane Database Syst Rev. 2025 Jun 16;6(6):CD011841. doi: 10.1002/14651858.CD011841.pub3.
10
Overview of oyster polysaccharide extraction process and pharmacological activity studies.
J Ethnopharmacol. 2025 Jun 10;351:120133. doi: 10.1016/j.jep.2025.120133.

本文引用的文献

1
Cross-species modeling of plant genomes at single-nucleotide resolution using a pretrained DNA language model.
Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2421738122. doi: 10.1073/pnas.2421738122. Epub 2025 Jun 9.
2
Foundation models in bioinformatics.
Natl Sci Rev. 2025 Jan 25;12(4):nwaf028. doi: 10.1093/nsr/nwaf028. eCollection 2025 Apr.
3
A Comprehensive Survey of Foundation Models in Medicine.
IEEE Rev Biomed Eng. 2025 May 6;PP. doi: 10.1109/RBME.2025.3531360.
4
Simulating 500 million years of evolution with a language model.
Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.
5
A foundation model of transcription across human cell types.
Nature. 2025 Jan;637(8047):965-973. doi: 10.1038/s41586-024-08391-z. Epub 2025 Jan 8.
7
An interpretable RNA foundation model for exploring functional RNA motifs in plants.
Nat Mach Intell. 2024;6(12):1616-1625. doi: 10.1038/s42256-024-00946-z. Epub 2024 Dec 9.
8
PDLLMs: A group of tailored DNA large language models for analyzing plant genomes.
Mol Plant. 2025 Feb 3;18(2):175-178. doi: 10.1016/j.molp.2024.12.006. Epub 2024 Dec 9.
9
Nucleotide Transformer: building and evaluating robust foundation models for human genomics.
Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.
10
Learning the language of DNA.
Science. 2024 Nov 15;386(6723):729-730. doi: 10.1126/science.adt3007. Epub 2024 Nov 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验