Zhu Sisi, Xu Hongquan, Liu Yuhong, Hong Yanfeng, Yang Haowen, Zhou Changli, Tao Lin
Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China.
School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, China.
Biotechnol Adv. 2025 Mar-Apr;79:108532. doi: 10.1016/j.biotechadv.2025.108532. Epub 2025 Feb 7.
Biosynthetic gene clusters (BGCs) are groups of clustered genes found in bacteria, fungi, and some plants and animals that are crucial for synthesizing secondary metabolites. In recent years, genome mining of BGCs has emerged as a prominent research focus, particularly in natural product discovery and drug development. Compared to traditional experimental methods, applying computational techniques has significantly enhanced the efficiency of BGC identification and annotation, thereby facilitating the discovery of novel metabolites. The advent of artificial intelligence, particularly machine learning models and more advanced deep learning algorithms, has significantly enhanced both the speed and precision of BGC mining. This review offers a comprehensive introduction to currently developed BGC databases and prediction tools, highlighting the potential of machine learning technologies in BGC mining. Additionally, it summarizes the challenges computational methods face in this area and discusses future research directions.
生物合成基因簇(BGCs)是在细菌、真菌以及一些植物和动物中发现的成簇基因群,它们对于次级代谢产物的合成至关重要。近年来,BGCs的基因组挖掘已成为一个突出的研究重点,尤其是在天然产物发现和药物开发方面。与传统实验方法相比,应用计算技术显著提高了BGC识别和注释的效率,从而促进了新型代谢产物的发现。人工智能的出现,特别是机器学习模型和更先进的深度学习算法,显著提高了BGC挖掘的速度和精度。本文综述全面介绍了当前开发的BGC数据库和预测工具,突出了机器学习技术在BGC挖掘中的潜力。此外,它总结了计算方法在该领域面临的挑战,并讨论了未来的研究方向。