Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K.
Toxicology & DMPK Research Department, Teijin Institute for Bio-medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo 191-8512, Japan.
Mol Pharm. 2023 Jun 5;20(6):3060-3072. doi: 10.1021/acs.molpharmaceut.3c00071. Epub 2023 Apr 25.
Pharmacokinetic (PK) parameters such as clearance (CL) and volume of distribution (Vd) have been the subject of previous predictive models. However, having information of the concentration over time profile explicitly can provide additional value like time above MIC or AUC, etc., to understand both the efficacy and safety-related aspects of a compound. In this work, we developed machine learning models for plasma concentration-time profiles after both and dosing for a series of 17 in-house projects. For explanatory variables, MACCS Keys chemical descriptors as well as and experimental PK parameters were used. The predictive accuracy of random forest (RF), message passing neural network, 2-compartment models using estimated CL and Vdss, and an average model (as a control experiment) was investigated using 5-fold cross-validation (5-fold CV) and leave-one-project-out validation (LOPO-V). The predictive accuracy of RF in 5-fold CV for and plasma concentration-time profiles was the best among the models studied, with an RMSE for dosing at 0.08, 1, and 8 h of 0.245, 0.474, and 0.462, respectively, and an RMSE for dosing at 0.25, 1, and 8 h of 0.500, 0.612, and 0.509, respectively. Furthermore, by investigating the importance of the PK parameters using the Gini index, we observed that the general prior knowledge in ADME research was reflected well in the respective feature importance of parameters such as predicted human Vd (hVd) for the initial distribution, mouse intrinsic CL and unbound fraction of mouse plasma for the elimination process, and Caco2 permeability for the absorption process. Also, this model is the first model that can predict twin peaks in the concentration-time profile much better than a baseline compartment model. Because of its combination of sufficient accuracy and speed of prediction, we found the model to be fit-for-purpose for practical lead optimization.
药代动力学 (PK) 参数,如清除率 (CL) 和分布容积 (Vd),一直是之前预测模型的主题。然而,明确获得浓度随时间变化的信息可以提供额外的价值,例如 MIC 以上时间或 AUC 等,以了解化合物的疗效和安全性方面。在这项工作中,我们为一系列 17 个内部项目的 和 剂量后的血浆浓度-时间曲线开发了机器学习模型。对于解释变量,使用了 MACCS 键化学描述符以及 和实验 PK 参数。通过 5 折交叉验证 (5-fold CV) 和留一项目外验证 (LOPO-V),研究了随机森林 (RF)、消息传递神经网络、使用估计的 CL 和 Vdss 的 2 隔室模型以及平均模型 (作为对照实验) 的预测准确性。在 5 折 CV 中,RF 对 和 血浆浓度-时间曲线的预测准确性在研究的模型中是最好的,对于 剂量,在 0.08、1 和 8 小时时的 RMSE 分别为 0.245、0.474 和 0.462,对于 剂量,在 0.25、1 和 8 小时时的 RMSE 分别为 0.500、0.612 和 0.509。此外,通过使用基尼指数调查 PK 参数的重要性,我们观察到 ADME 研究中的一般先验知识很好地反映在预测人 Vd (hVd) 等参数的特征重要性上,hVd 是初始分布的参数,小鼠内在 CL 和小鼠血浆未结合分数是消除过程的参数,Caco2 渗透率是吸收过程的参数。此外,该模型是第一个能够更好地预测浓度-时间曲线双峰的模型,优于基线隔室模型。由于其具有足够的准确性和预测速度,我们发现该模型非常适合实际的先导化合物优化。