Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.
Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States.
ACS Synth Biol. 2023 Sep 15;12(9):2650-2662. doi: 10.1021/acssynbio.3c00234. Epub 2023 Aug 22.
Natural products (NPs) produced by microorganisms and plants are a major source of drugs, herbicides, and fungicides. Thanks to recent advances in DNA sequencing, bioinformatics, and genome mining tools, a vast amount of data on NP biosynthesis has been generated over the years, which has been increasingly exploited to develop machine learning (ML) tools for NP discovery. In this review, we discuss the latest advances in developing and applying ML tools for exploring the potential NPs that can be encoded by genomic language and predicting the types of bioactivities of NPs. We also examine the technical challenges associated with the development and application of ML tools for NP research.
天然产物(NPs)由微生物和植物产生,是药物、除草剂和杀菌剂的主要来源。近年来,得益于 DNA 测序、生物信息学和基因组挖掘工具的进步,大量关于 NP 生物合成的相关数据被生成,这些数据被越来越多地用于开发用于 NP 发现的机器学习(ML)工具。在这篇综述中,我们讨论了开发和应用 ML 工具以探索可以用基因组语言编码的潜在 NP 并预测 NP 生物活性类型的最新进展。我们还研究了与 NP 研究的 ML 工具的开发和应用相关的技术挑战。