Medicinal Chemistry, Biogen, Cambridge, Massachusetts 02142, United States.
DMPK, Biogen, Cambridge, Massachusetts 02142, United States.
J Chem Inf Model. 2023 Jun 12;63(11):3263-3274. doi: 10.1021/acs.jcim.3c00160. Epub 2023 May 22.
Absorption, distribution, metabolism, and excretion (ADME), which collectively define the concentration profile of a drug at the site of action, are of critical importance to the success of a drug candidate. Recent advances in machine learning algorithms and the availability of larger proprietary as well as public ADME data sets have generated renewed interest within the academic and pharmaceutical science communities in predicting pharmacokinetic and physicochemical endpoints in early drug discovery. In this study, we collected 120 internal prospective data sets over 20 months across six ADME in vitro endpoints: human and rat liver microsomal stability, MDR1-MDCK efflux ratio, solubility, and human and rat plasma protein binding. A variety of machine learning algorithms in combination with different molecular representations were evaluated. Our results suggest that gradient boosting decision tree and deep learning models consistently outperformed random forest over time. We also observed better performance when models were retrained on a fixed schedule, and the more frequent retraining generally resulted in increased accuracy, while hyperparameters tuning only improved the prospective predictions marginally.
吸收、分布、代谢和排泄(ADME)共同定义了药物在作用部位的浓度特征,对候选药物的成功至关重要。最近,机器学习算法的进步以及更大规模的专有和公共 ADME 数据集的出现,重新激发了学术界和制药科学领域对早期药物发现中预测药代动力学和物理化学终点的兴趣。在这项研究中,我们在 20 个月内收集了 120 个内部前瞻性数据集,涵盖了六个 ADME 体外终点:人肝微粒体和大鼠肝微粒体稳定性、MDR1-MDCK 外排比、溶解度以及人血浆和大鼠血浆蛋白结合。我们评估了多种机器学习算法和不同的分子表示方法。结果表明,梯度提升决策树和深度学习模型在整个时间内始终优于随机森林。我们还观察到,模型按固定时间表重新训练时性能更好,更频繁的重新训练通常会提高准确性,而超参数调整仅略微提高了前瞻性预测的准确性。