Ash Jeremy R, Wognum Cas, Rodríguez-Pérez Raquel, Aldeghi Matteo, Cheng Alan C, Clevert Djork-Arné, Engkvist Ola, Fang Cheng, Price Daniel J, Hughes-Oliver Jacqueline M, Walters W Patrick
Johnson & Johnson Innovative Medicine, Spring House, Pennsylvania 19477, United States.
Valence Laboratories, Montréal, Québec H2S 3G6, Canada.
J Chem Inf Model. 2025 Sep 22;65(18):9398-9411. doi: 10.1021/acs.jcim.5c01609. Epub 2025 Sep 11.
Machine Learning (ML) methods that relate molecular structure to properties are frequently proposed as in silico surrogates for expensive or time-consuming experiments. In small molecule drug discovery, such methods inform high-stakes decisions like compound synthesis and in vivo studies. This application lies at the intersection of multiple scientific disciplines. When comparing new ML methods to baseline or state-of-the-art approaches, statistically rigorous method comparison protocols and domain-appropriate performance metrics are essential to ensure replicability and ultimately the adoption of ML in small molecule drug discovery. This paper proposes a set of guidelines to incentivize rigorous and domain-appropriate techniques for method comparison tailored to small molecule property modeling. These guidelines, accompanied by annotated examples using open-source software tools, lay a foundation for robust ML benchmarking and thus the development of more impactful methods.
将分子结构与性质相关联的机器学习(ML)方法经常被提议作为昂贵或耗时实验的计算机模拟替代方法。在小分子药物发现中,此类方法为化合物合成和体内研究等重大决策提供信息。这一应用处于多个科学学科的交叉点。在将新的ML方法与基线或最先进方法进行比较时,统计上严格的方法比较协议和适合该领域的性能指标对于确保可重复性以及最终在小分子药物发现中采用ML至关重要。本文提出了一套指导方针,以激励针对小分子性质建模量身定制的严格且适合该领域的方法比较技术。这些指导方针,再加上使用开源软件工具的注释示例,为强大的ML基准测试奠定了基础,从而推动更具影响力的方法的开发。