School of Computing, University of Kent, Canterbury, CT2 7NF UK.
Medway School of Pharmacy, Universities of Kent and Greenwich, Chatham, Kent, ME4 4TB UK.
J Cheminform. 2015 Feb 26;7:6. doi: 10.1186/s13321-015-0054-x. eCollection 2015.
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values.
Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied.
Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
分布容积是一个重要的药代动力学性质,它表示药物在体内组织中的分布程度。本文讨论了如何使用数据挖掘(或机器学习)领域的基于决策树的回归方法来估计人体中化学化合物在稳态时的表观分布容积(Vss)。因此,讨论了几种不同类型的基于决策树的回归方法的优缺点。这些回归方法使用化合物的分子描述符和化合物的组织:血浆分配系数(Kt:p)作为预测特征来预测 Vss,Kt:p 通常用于基于生理学的药代动力学。因此,本工作评估了通过使用不仅是化合物的分子描述符,而且还(预测 Kt:p 值的)子集作为输入,是否可以使基于数据挖掘的 Vss 预测更准确。
比较仅使用分子描述符的模型,特别是 Bagging 决策树(平均折叠误差为 2.33),与那些除了分子描述符外还使用预测的 Kt:p 值的模型,例如使用脂肪 Kt:p 的 Bagging 决策树(平均折叠误差为 2.29),表明如果应用了预先的特征选择,则将预测的 Kt:p 值用作描述符可能有助于使用决策树准确预测 Vss。
本文提出的基于决策树的模型具有合理的准确性,与文献中报道的 Vss 种间外推的准确性相似。在药物发现中,新化合物的 Vss 估计将受益于能够整合大量和多样化的数据的方法,以及灵活的非线性数据挖掘方法,如能够生成可解释模型的决策树。
用于预测药物组织分配系数和分布容积的决策树。