Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany.
Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland.
J Med Chem. 2021 Dec 23;64(24):17744-17752. doi: 10.1021/acs.jmedchem.1c01789. Epub 2021 Dec 13.
The prediction of compound properties from chemical structure is a main task for machine learning (ML) in medicinal chemistry. ML is often applied to large data sets in applications such as compound screening, virtual library enumeration, or generative chemistry. Albeit desirable, a detailed understanding of ML model decisions is typically not required in these cases. By contrast, compound optimization efforts rely on small data sets to identify structural modifications leading to desired property profiles. In this situation, if ML is applied, one usually is reluctant to make decisions based on predictions that cannot be rationalized. Only few ML methods are interpretable. However, to yield insights into complex ML model decisions, explanatory approaches can be applied. Herein, methodologies for better understanding of ML models or explaining individual predictions are reviewed and current challenges in integrating ML into medicinal chemistry programs as well as future opportunities are discussed.
从化学结构预测化合物性质是机器学习(ML)在药物化学中的主要任务。ML 经常应用于化合物筛选、虚拟库枚举或生成化学等应用中的大数据集。尽管需要,但在这些情况下,通常不需要详细了解 ML 模型决策。相比之下,化合物优化工作依赖于小数据集来识别导致所需属性分布的结构修改。在这种情况下,如果应用 ML,则通常不愿意基于无法合理化的预测做出决策。只有少数 ML 方法是可解释的。然而,为了深入了解复杂的 ML 模型决策,可以应用解释方法。本文回顾了更好地理解 ML 模型或解释个别预测的方法,并讨论了将 ML 集成到药物化学计划中的当前挑战和未来机会。