Medway School of Pharmacy, Universities of Kent and Greenwich, Central Avenue, Chatham Maritime, Kent ME4 4TB, UK.
Bioimpacts. 2013;3(1):21-7. doi: 10.5681/bi.2013.011. Epub 2013 Feb 21.
The prediction of plasma protein binding (ppb) is of paramount importance in the pharmacokinetics characterization of drugs, as it causes significant changes in volume of distribution, clearance and drug half life. This study utilized Quantitative Structure - Activity Relationships (QSAR) for the prediction of plasma protein binding.
Protein binding values for 794 compounds were collated from literature. The data was partitioned into a training set of 662 compounds and an external validation set of 132 compounds. Physicochemical and molecular descriptors were calculated for each compound using ACD labs/logD, MOE (Chemical Computing Group) and Symyx QSAR software packages. Several data mining tools were employed for the construction of models. These included stepwise regression analysis, Classification and Regression Trees (CART), Boosted trees and Random Forest.
Several predictive models were identified; however, one model in particular produced significantly superior prediction accuracy for the external validation set as measured using mean absolute error and correlation coefficient. The selected model was a boosted regression tree model which had the mean absolute error for training set of 13.25 and for validation set of 14.96.
Plasma protein binding can be modeled using simple regression trees or multiple linear regressions with reasonable model accuracies. These interpretable models were able to identify the governing molecular factors for a high ppb that included hydrophobicity, van der Waals surface area parameters, and aromaticity. On the other hand, the more complicated ensemble method of boosted regression trees produced the most accurate ppb estimations for the external validation set.
预测血浆蛋白结合(ppb)在药物药代动力学特征的研究中至关重要,因为它会导致分布容积、清除率和药物半衰期发生显著变化。本研究利用定量构效关系(QSAR)进行血浆蛋白结合预测。
从文献中整理了 794 种化合物的蛋白结合值。将数据分为 662 种化合物的训练集和 132 种化合物的外部验证集。使用 ACD labs/logD、MOE(化学计算集团)和 Symyx QSAR 软件包为每种化合物计算了物理化学和分子描述符。使用多种数据挖掘工具构建模型。这些包括逐步回归分析、分类和回归树(CART)、Boosted trees 和随机森林。
确定了几个预测模型;然而,特别是一个模型在使用平均绝对误差和相关系数测量时,对外部验证集的预测准确性显著提高。所选模型是一个 Boosted regression tree 模型,其训练集的平均绝对误差为 13.25,验证集的平均绝对误差为 14.96。
可以使用简单的回归树或多元线性回归对血浆蛋白结合进行建模,模型具有合理的准确性。这些可解释的模型能够识别出高 ppb 的主要分子因素,包括疏水性、范德华表面积参数和芳香性。另一方面,更复杂的集成方法 Boosted regression trees 对外部验证集产生了最准确的 ppb 估计。