植物分子生物学中的基础模型：进展、挑战与未来方向。

Foundation models in plant molecular biology: advances, challenges, and future directions.

作者信息

Xu Feng, Wu Tianhao, Cheng Qian, Wang Xiangfeng, Yan Jun

机构信息

Frontiers Science Center for Molecular Design Breeding, State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, China.

出版信息

Front Plant Sci. 2025 Jun 3;16:1611992. doi: 10.3389/fpls.2025.1611992. eCollection 2025.

DOI:10.3389/fpls.2025.1611992

PMID:40530265

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12170578/

Abstract

A foundation model (FM) is a neural network trained on large-scale data using unsupervised or self-supervised learning, capable of adapting to a wide range of downstream tasks. This review provides a comprehensive overview of FMs in plant molecular biology, emphasizing recent advances and future directions. It begins by tracing the evolution of biological FMs across the DNA, RNA, protein, and single-cell levels, from tools inspired by natural language processing (NLP) to transformative models for decoding complex biological sequences. The review then focuses on plant-specific FMs such as GPN, AgroNT, PDLLMs, PlantCaduceus, and PlantRNA-FM, which address challenges that are widespread among plant genomes, including polyploidy, high repetitive sequence content, and environment-responsive regulatory elements, alongside universal FMs like GENERator and Evo 2, which leverage extensive cross-species training data for sequence design and prediction of mutation effects. Key opportunities and challenges in plant molecular biology FM development are further outlined, such as data heterogeneity, biologically informed architectures, cross-species generalization, and computational efficiency. Future research should prioritize improvements in model generalization, multi-modal data integration, and computational optimization to overcome existing limitations and unlock the potential of FMs in plant science. This review serves as an essential resource for plant molecular biologists and offers a clear snapshot of the current state and future potential of FMs in the field.

摘要

基础模型（FM）是一种通过无监督或自监督学习在大规模数据上训练的神经网络，能够适应广泛的下游任务。本综述全面概述了基础模型在植物分子生物学中的应用，重点介绍了近期进展和未来方向。首先追溯了生物基础模型在DNA、RNA、蛋白质和单细胞水平上的发展历程，从受自然语言处理（NLP）启发的工具到用于解码复杂生物序列的变革性模型。接着，综述聚焦于植物特异性基础模型，如GPN、AgroNT、PDLLMs、PlantCaduceus和PlantRNA-FM，这些模型解决了植物基因组中普遍存在的挑战，包括多倍体、高重复序列含量和环境响应调控元件，同时也介绍了像GENERator和Evo 2这样的通用基础模型，它们利用广泛的跨物种训练数据进行序列设计和突变效应预测。进一步概述了植物分子生物学基础模型开发中的关键机遇和挑战，如数据异质性、生物信息架构、跨物种泛化和计算效率。未来的研究应优先改进模型泛化、多模态数据整合和计算优化，以克服现有局限性并释放基础模型在植物科学中的潜力。本综述是植物分子生物学家的重要资源，清晰呈现了该领域基础模型的当前状态和未来潜力。

相似文献

Foundation models in plant molecular biology: advances, challenges, and future directions.植物分子生物学中的基础模型：进展、挑战与未来方向。

Front Plant Sci. 2025 Jun 3;16:1611992. doi: 10.3389/fpls.2025.1611992. eCollection 2025.

Wood Waste Valorization and Classification Approaches: A systematic review.木材废料的增值与分类方法：一项系统综述

Open Res Eur. 2025 May 6;5:5. doi: 10.12688/openreseurope.18862.1. eCollection 2025.

Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.基于分子特征的腹膜后脂肪肉瘤分类：一项前瞻性队列研究。

Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

Advancing respiratory disease diagnosis: A deep learning and vision transformer-based approach with a novel X-ray dataset.推进呼吸系统疾病诊断：一种基于深度学习和视觉Transformer的方法及新型X射线数据集

Comput Biol Med. 2025 Aug;194:110501. doi: 10.1016/j.compbiomed.2025.110501. Epub 2025 Jun 9.

Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果：面向临床医生的网状Meta分析教程

Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.社区对土壤传播蠕虫群体药物给药的看法：定性证据综合分析

Cochrane Database Syst Rev. 2025 Jun 20;6:CD015794. doi: 10.1002/14651858.CD015794.pub2.

Introducing the dataset for measuring centrality for sustainability-A case study of Pecinci municipality, Serbia.介绍用于衡量可持续性中心性的数据集——以塞尔维亚佩钦奇市为例

Data Brief. 2025 May 27;61:111714. doi: 10.1016/j.dib.2025.111714. eCollection 2025 Aug.

Interventions for central serous chorioretinopathy: a network meta-analysis.中心性浆液性脉络膜视网膜病变的干预措施：一项网状Meta分析

Cochrane Database Syst Rev. 2025 Jun 16;6(6):CD011841. doi: 10.1002/14651858.CD011841.pub3.

Overview of oyster polysaccharide extraction process and pharmacological activity studies.牡蛎多糖提取工艺及药理活性研究综述

J Ethnopharmacol. 2025 Jun 10;351:120133. doi: 10.1016/j.jep.2025.120133.

本文引用的文献

Cross-species modeling of plant genomes at single-nucleotide resolution using a pretrained DNA language model.使用预训练的DNA语言模型在单核苷酸分辨率下对植物基因组进行跨物种建模。

Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2421738122. doi: 10.1073/pnas.2421738122. Epub 2025 Jun 9.

Foundation models in bioinformatics.生物信息学中的基础模型。

Natl Sci Rev. 2025 Jan 25;12(4):nwaf028. doi: 10.1093/nsr/nwaf028. eCollection 2025 Apr.

A Comprehensive Survey of Foundation Models in Medicine.医学基础模型综合调查

IEEE Rev Biomed Eng. 2025 May 6;PP. doi: 10.1109/RBME.2025.3531360.

Simulating 500 million years of evolution with a language model.用语言模型模拟5亿年的进化历程。

Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.

A foundation model of transcription across human cell types.一种跨人类细胞类型的转录基础模型。

Nature. 2025 Jan;637(8047):965-973. doi: 10.1038/s41586-024-08391-z. Epub 2025 Jan 8.

A DNA language model based on multispecies alignment predicts the effects of genome-wide variants.基于多物种比对的DNA语言模型可预测全基因组变异的影响。

Nat Biotechnol. 2025 Jan 2. doi: 10.1038/s41587-024-02511-w.

An interpretable RNA foundation model for exploring functional RNA motifs in plants.一种用于探索植物中功能性RNA基序的可解释RNA基础模型。

Nat Mach Intell. 2024;6(12):1616-1625. doi: 10.1038/s42256-024-00946-z. Epub 2024 Dec 9.

PDLLMs: A group of tailored DNA large language models for analyzing plant genomes.PDLLMs：一组用于分析植物基因组的定制化DNA大语言模型。

Mol Plant. 2025 Feb 3;18(2):175-178. doi: 10.1016/j.molp.2024.12.006. Epub 2024 Dec 9.

Nucleotide Transformer: building and evaluating robust foundation models for human genomics.核苷酸变换器：构建和评估用于人类基因组学的强大基础模型。

Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.

Learning the language of DNA.学习 DNA 的语言。

Science. 2024 Nov 15;386(6723):729-730. doi: 10.1126/science.adt3007. Epub 2024 Nov 14.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验