Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China.
Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, 518120, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae548.
Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly through foundation models (FMs), which address longstanding challenges in bioinformatics such as limited annotated data and data noise. These AI techniques have demonstrated remarkable efficacy across various downstream validation tasks, effectively representing diverse biological entities and heralding a new era in computational biology. The primary goal of this survey is to conduct a general investigation and summary of FMs in bioinformatics, tracing their evolutionary trajectory, current research landscape, and methodological frameworks. Our primary focus is on elucidating the application of FMs to specific biological problems, offering insights to guide the research community in choosing appropriate FMs for tasks like sequence analysis, structure prediction, and function annotation. Each section delves into the intricacies of the targeted challenges, contrasting the architectures and advancements of FMs with conventional methods and showcasing their utility across different biological domains. Further, this review scrutinizes the hurdles and constraints encountered by FMs in biology, including issues of data noise, model interpretability, and potential biases. This analysis provides a theoretical groundwork for understanding the circumstances under which certain FMs may exhibit suboptimal performance. Lastly, we outline prospective pathways and methodologies for the future development of FMs in biological research, facilitating ongoing innovation in the field. This comprehensive examination not only serves as an academic reference but also as a roadmap for forthcoming explorations and applications of FMs in biology.
生物信息学在人工智能 (AI) 领域经历了范式转变,特别是通过基础模型 (FM) ,解决了生物信息学中长期存在的挑战,如有限的注释数据和数据噪声。这些 AI 技术在各种下游验证任务中表现出了显著的效果,有效地表示了各种生物实体,并开创了计算生物学的新时代。本调查的主要目的是对生物信息学中的 FM 进行全面调查和总结,追踪其进化轨迹、当前研究现状和方法框架。我们的主要重点是阐明 FM 在特定生物问题中的应用,为指导研究社区选择适合序列分析、结构预测和功能注释等任务的 FM 提供见解。每个部分都深入探讨了目标挑战的细节,对比了 FM 的架构和进展与传统方法,并展示了它们在不同生物领域的应用。此外,本综述还仔细研究了 FM 在生物学中遇到的障碍和限制,包括数据噪声、模型可解释性和潜在偏差等问题。这种分析为理解某些 FM 在某些情况下可能表现不佳提供了理论基础。最后,我们概述了未来生物研究中 FM 的未来发展途径和方法,为该领域的持续创新提供了基础。这项全面的研究不仅是学术参考,也是未来在生物学中探索和应用 FM 的路线图。