Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, 4070Basel, Switzerland.
J Chem Inf Model. 2023 Jan 23;63(2):442-458. doi: 10.1021/acs.jcim.2c01134. Epub 2023 Jan 3.
Although computational predictions of pharmacokinetics (PK) are desirable at the drug design stage, existing approaches are often limited by prediction accuracy and human interpretability. Using a discovery data set of mouse and rat PK studies at Roche (9,685 unique compounds), we performed a proof-of-concept study to predict key PK properties from chemical structure alone, including plasma clearance (CLp), volume of distribution at steady-state (Vss), and oral bioavailability (F). Ten machine learning (ML) models were evaluated, including Single-Task, Multitask, and transfer learning approaches (i.e., pretraining with data). In addition to prediction accuracy, we emphasized human interpretability of outcomes, especially the quantification of uncertainty, applicability domains, and explanations of predictions in terms of molecular features. Results show that intravenous (IV) PK properties (CLp and Vss) can be predicted with good precision (average absolute fold error, AAFE of 1.96-2.84 depending on data split) and low bias (average fold error, AFE of 0.98-1.36), with AutoGluon, Gaussian Process Regressor (GP), and ChemProp displaying the best performance. Driven by higher complexity of oral PK studies, predictions of F were more challenging, with the best AAFE values of 2.35-2.60 and higher overprediction bias (AFE of 1.45-1.62). Multi-Task approaches and pretraining of ChemProp neural networks with data showed similar precision to Single-Task models but helped reduce the bias and increase correlations between observations and predictions. A combination of GP-computed prediction variance, molecular clustering, and dimensionality-reduction provided valuable quantitative insights into prediction uncertainty and applicability domains. SHAPley Additive exPlanations (SHAPs) highlighted molecular features contributing to prediction outcomes of Vss, providing explanations that could aid drug design. Combined results show that computational predictions of PK are feasible at the drug design stage, with several ML technologies converging to successfully leverage historical PK data sets. Further studies are needed to unlock the full potential of this approach, especially with respect to data set sizes and quality, transfer learning between and data sets, model-independent quantification of uncertainty, and explainability of predictions.
尽管在药物设计阶段进行药代动力学(PK)的计算预测是理想的,但现有的方法往往受到预测准确性和人类可解释性的限制。我们使用罗氏的小鼠和大鼠 PK 研究的发现数据集(9685 个独特的化合物)进行了概念验证研究,旨在仅从化学结构预测关键的 PK 性质,包括血浆清除率(CLp)、稳态分布容积(Vss)和口服生物利用度(F)。我们评估了 10 种机器学习(ML)模型,包括单任务、多任务和迁移学习方法(即使用数据进行预训练)。除了预测准确性外,我们还强调了结果的人类可解释性,特别是不确定性的量化、适用域以及根据分子特征解释预测。结果表明,静脉内(IV)PK 性质(CLp 和 Vss)可以以良好的精度(取决于数据分割,平均绝对折叠误差 AAFE 为 1.96-2.84)和低偏差(平均折叠误差 AFE 为 0.98-1.36)进行预测,其中 AutoGluon、高斯过程回归器(GP)和 ChemProp 表现出最佳性能。由于口服 PK 研究的复杂性更高,F 的预测更具挑战性,最佳的 AAFE 值为 2.35-2.60,且过预测偏差更高(AFE 为 1.45-1.62)。ChemProp 神经网络的多任务方法和使用 数据的预训练与单任务模型具有相似的精度,但有助于降低偏差并增加观察值和预测值之间的相关性。GP 计算的预测方差、分子聚类和降维的组合为预测不确定性和适用域提供了有价值的定量见解。SHAPley 可加性解释(SHAP)突出了对 Vss 预测结果有贡献的分子特征,提供了有助于药物设计的解释。综合结果表明,在药物设计阶段进行 PK 的计算预测是可行的,几种 ML 技术成功地融合在一起,利用了历史 PK 数据集。需要进一步的研究来释放这种方法的全部潜力,特别是在数据集大小和质量、数据集之间的迁移学习、不确定性的模型独立量化以及预测的可解释性方面。