Department of Bioengineering, University of Illinois, Urbana-Champaign, Illinois 61801, United States.
Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana-Champaign, Illinois 61801, United States.
ACS Appl Bio Mater. 2024 Feb 19;7(2):657-684. doi: 10.1021/acsabm.3c00054. Epub 2023 Aug 3.
Initially part of the field of artificial intelligence, machine learning (ML) has become a booming research area since branching out into its own field in the 1990s. After three decades of refinement, ML algorithms have accelerated scientific developments across a variety of research topics. The field of small molecule design is no exception, and an increasing number of researchers are applying ML techniques in their pursuit of discovering, generating, and optimizing small molecule compounds. The goal of this review is to provide simple, yet descriptive, explanations of some of the most commonly utilized ML algorithms in the field of small molecule design along with those that are highly applicable to an experimentally focused audience. The algorithms discussed here span across three ML paradigms: supervised learning, unsupervised learning, and ensemble methods. Examples from the published literature will be provided for each algorithm. Some common pitfalls of applying ML to biological and chemical data sets will also be explained, alongside a brief summary of a few more advanced paradigms, including reinforcement learning and semi-supervised learning.
最初是人工智能领域的一部分,机器学习(ML)自 20 世纪 90 年代发展成为一个独立的领域以来,已经成为一个蓬勃发展的研究领域。经过三十年的完善,机器学习算法加速了各个研究课题的科学发展。小分子设计领域也不例外,越来越多的研究人员在探索、生成和优化小分子化合物的过程中应用机器学习技术。本文的目的是提供一些在小分子设计领域中最常用的机器学习算法的简单但描述性的解释,以及那些非常适用于实验性受众的算法。这里讨论的算法涵盖了三种机器学习范式:监督学习、无监督学习和集成方法。将为每个算法提供来自已发表文献的示例。还将解释将机器学习应用于生物和化学数据集的一些常见陷阱,并简要介绍包括强化学习和半监督学习在内的几个更高级的范例。