Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.
KAIST Institute for Artificial Intelligence, KAIST, Daejeon 34141, Republic of Korea.
Nat Prod Rep. 2021 Nov 17;38(11):1954-1966. doi: 10.1039/d1np00016k.
Covering: 2016 to 2021Discovery of novel natural products has been greatly facilitated by advances in genome sequencing, genome mining and analytical techniques. As a result, the volume of data for natural products has increased over the years, which started to serve as ingredients for developing machine learning models. In the past few years, a number of machine learning models have been developed to examine various aspects of a molecule by effectively processing its molecular structure. Understanding of the biological effects of natural products can benefit from such machine learning approaches. In this context, this Highlight reviews recent studies on machine learning models developed to infer various biological effects of molecules. A particular attention is paid to molecular featurization, or computational representation of a molecular structure, which is an essential process during the development of a machine learning model. Technical challenges associated with the use of machine learning for natural products are further discussed.
2016 年至 2021 年
基因组测序、基因组挖掘和分析技术的进步极大地促进了新型天然产物的发现。因此,天然产物的数据量逐年增加,这些数据开始成为开发机器学习模型的原料。在过去的几年中,已经开发了许多机器学习模型来通过有效处理分子结构来检查分子的各个方面。通过这种机器学习方法可以更好地理解天然产物的生物学效应。在这种情况下,本文重点介绍了最近开发的用于推断分子各种生物学效应的机器学习模型的研究。特别关注分子特征化,即分子结构的计算表示,这是机器学习模型开发过程中的一个重要步骤。进一步讨论了使用机器学习处理天然产物所面临的技术挑战。