• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物信息学中基础模型的进展与机遇。

Progress and opportunities of foundation models in bioinformatics.

机构信息

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China.

Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, 518120, China.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae548.

DOI:10.1093/bib/bbae548
PMID:39461902
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11512649/
Abstract

Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly through foundation models (FMs), which address longstanding challenges in bioinformatics such as limited annotated data and data noise. These AI techniques have demonstrated remarkable efficacy across various downstream validation tasks, effectively representing diverse biological entities and heralding a new era in computational biology. The primary goal of this survey is to conduct a general investigation and summary of FMs in bioinformatics, tracing their evolutionary trajectory, current research landscape, and methodological frameworks. Our primary focus is on elucidating the application of FMs to specific biological problems, offering insights to guide the research community in choosing appropriate FMs for tasks like sequence analysis, structure prediction, and function annotation. Each section delves into the intricacies of the targeted challenges, contrasting the architectures and advancements of FMs with conventional methods and showcasing their utility across different biological domains. Further, this review scrutinizes the hurdles and constraints encountered by FMs in biology, including issues of data noise, model interpretability, and potential biases. This analysis provides a theoretical groundwork for understanding the circumstances under which certain FMs may exhibit suboptimal performance. Lastly, we outline prospective pathways and methodologies for the future development of FMs in biological research, facilitating ongoing innovation in the field. This comprehensive examination not only serves as an academic reference but also as a roadmap for forthcoming explorations and applications of FMs in biology.

摘要

生物信息学在人工智能 (AI) 领域经历了范式转变,特别是通过基础模型 (FM) ,解决了生物信息学中长期存在的挑战,如有限的注释数据和数据噪声。这些 AI 技术在各种下游验证任务中表现出了显著的效果,有效地表示了各种生物实体,并开创了计算生物学的新时代。本调查的主要目的是对生物信息学中的 FM 进行全面调查和总结,追踪其进化轨迹、当前研究现状和方法框架。我们的主要重点是阐明 FM 在特定生物问题中的应用,为指导研究社区选择适合序列分析、结构预测和功能注释等任务的 FM 提供见解。每个部分都深入探讨了目标挑战的细节,对比了 FM 的架构和进展与传统方法,并展示了它们在不同生物领域的应用。此外,本综述还仔细研究了 FM 在生物学中遇到的障碍和限制,包括数据噪声、模型可解释性和潜在偏差等问题。这种分析为理解某些 FM 在某些情况下可能表现不佳提供了理论基础。最后,我们概述了未来生物研究中 FM 的未来发展途径和方法,为该领域的持续创新提供了基础。这项全面的研究不仅是学术参考,也是未来在生物学中探索和应用 FM 的路线图。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6a3/11512649/2d13873a8cbb/bbae548f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6a3/11512649/5bb8051a8ec4/bbae548f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6a3/11512649/9f112b553991/bbae548f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6a3/11512649/2d13873a8cbb/bbae548f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6a3/11512649/5bb8051a8ec4/bbae548f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6a3/11512649/9f112b553991/bbae548f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6a3/11512649/2d13873a8cbb/bbae548f3.jpg

相似文献

1
Progress and opportunities of foundation models in bioinformatics.生物信息学中基础模型的进展与机遇。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae548.
2
Artificial intelligence and bioinformatics: a journey from traditional techniques to smart approaches.人工智能与生物信息学:从传统技术到智能方法的历程。
Gastroenterol Hepatol Bed Bench. 2024;17(3):241-252. doi: 10.22037/ghfbb.v17i3.2977.
3
The Applications of Artificial Intelligence (AI)-Driven Tools in Virus-Like Particles (VLPs) Research.人工智能 (AI) 驱动工具在病毒样颗粒 (VLPs) 研究中的应用。
Curr Microbiol. 2024 Jun 21;81(8):234. doi: 10.1007/s00284-024-03750-5.
4
Foundation models in ophthalmology.眼科的基础模型。
Br J Ophthalmol. 2024 Sep 20;108(10):1341-1348. doi: 10.1136/bjo-2024-325459.
5
AI-Driven Deep Learning Techniques in Protein Structure Prediction.人工智能驱动的深度学习技术在蛋白质结构预测中的应用。
Int J Mol Sci. 2024 Aug 1;25(15):8426. doi: 10.3390/ijms25158426.
6
The Promises and Perils of Foundation Models in Dermatology.皮肤科基础模型的承诺与挑战。
J Invest Dermatol. 2024 Jul;144(7):1440-1448. doi: 10.1016/j.jid.2023.12.019. Epub 2024 Mar 4.
7
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
8
Artificial intelligence and Machine Learning approaches in sports: Concepts, applications, challenges, and future perspectives.人工智能和机器学习在体育中的应用:概念、应用、挑战和未来展望。
Braz J Phys Ther. 2024 May-Jun;28(3):101083. doi: 10.1016/j.bjpt.2024.101083. Epub 2024 May 21.
9
Computational intelligence techniques in bioinformatics.计算智能技术在生物信息学中的应用。
Comput Biol Chem. 2013 Dec;47:37-47. doi: 10.1016/j.compbiolchem.2013.04.007. Epub 2013 Jul 10.
10

引用本文的文献

1
Language Modelling Techniques for Analysing the Impact of Human Genetic Variation.用于分析人类基因变异影响的语言建模技术
Bioinform Biol Insights. 2025 Sep 2;19:11779322251358314. doi: 10.1177/11779322251358314. eCollection 2025.
2
scOTM: A Deep Learning Framework for Predicting Single-Cell Perturbation Responses with Large Language Models.scOTM:一种使用大语言模型预测单细胞扰动反应的深度学习框架。
Bioengineering (Basel). 2025 Aug 20;12(8):884. doi: 10.3390/bioengineering12080884.
3
In silico prediction of variant effects: promises and limitations for precision plant breeding.

本文引用的文献

1
scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration.scCross:一个深度生成模型,用于将单细胞多组学数据进行统一,实现无缝集成、跨模态生成和计算探索。
Genome Biol. 2024 Jul 29;25(1):198. doi: 10.1186/s13059-024-03338-z.
2
Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。
Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.
3
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.
变异效应的计算机模拟预测:精准植物育种的前景与局限
Theor Appl Genet. 2025 Jul 28;138(8):193. doi: 10.1007/s00122-025-04973-1.
4
Benchmarking transcription factor binding site prediction models: a comparative analysis on synthetic and biological data.基准测试转录因子结合位点预测模型:对合成数据和生物数据的比较分析
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf363.
5
Out of distribution learning in bioinformatics: advancements and challenges.生物信息学中的分布外学习:进展与挑战
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf294.
6
Foundation models in plant molecular biology: advances, challenges, and future directions.植物分子生物学中的基础模型:进展、挑战与未来方向。
Front Plant Sci. 2025 Jun 3;16:1611992. doi: 10.3389/fpls.2025.1611992. eCollection 2025.
7
AI-assisted Diagnosis of Nonmelanoma Skin Cancer in Resource-Limited Settings.资源有限环境下非黑色素瘤皮肤癌的人工智能辅助诊断
Cancer Epidemiol Biomarkers Prev. 2025 Jul 1;34(7):1080-1088. doi: 10.1158/1055-9965.EPI-25-0132.
8
Advanced machine learning framework for enhancing breast cancer diagnostics through transcriptomic profiling.通过转录组分析增强乳腺癌诊断的先进机器学习框架。
Discov Oncol. 2025 Mar 17;16(1):334. doi: 10.1007/s12672-025-02111-3.
9
Foundation models in bioinformatics.生物信息学中的基础模型。
Natl Sci Rev. 2025 Jan 25;12(4):nwaf028. doi: 10.1093/nsr/nwaf028. eCollection 2025 Apr.
scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
4
The high-dimensional space of human diseases built from diagnosis records and mapped to genetic loci.基于诊断记录构建并映射到遗传基因座的人类疾病高维空间。
Nat Comput Sci. 2023 May;3(5):403-417. doi: 10.1038/s43588-023-00453-y. Epub 2023 May 22.
5
Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning.基于预训练蛋白质语言模型和对比学习的蛋白质-DNA 结合位点预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad488.
6
Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary.基于机器学习的未知原发癌种的遗传学分类和治疗反应预测。
Nat Med. 2023 Aug;29(8):2057-2067. doi: 10.1038/s41591-023-02482-6. Epub 2023 Aug 7.
7
The shaky foundations of large language models and foundation models for electronic health records.用于电子健康记录的大语言模型和基础模型的不稳定基础。
NPJ Digit Med. 2023 Jul 29;6(1):135. doi: 10.1038/s41746-023-00879-8.
8
Assessment of emerging pretraining strategies in interpretable multimodal deep learning for cancer prognostication.用于癌症预后的可解释多模态深度学习中新兴预训练策略的评估。
BioData Min. 2023 Jul 22;16(1):23. doi: 10.1186/s13040-023-00338-w.
9
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
10
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.