Hudelson Matthew G, Jones Jeffrey P
Department of Mathematics, Washington State University, P.O. Box 643113, Pullman, 99164-3113, USA.
J Med Chem. 2006 Jul 13;49(14):4367-73. doi: 10.1021/jm0601553.
A new method, called line-walking recursive partitioning (LWRP), for partitioning diverse structures on the basis of chemical properties that uses only nine descriptors of the shape, polarizability, and charge of the molecule is described. We use a training set of over 600 compounds and a validation set of 100 compounds for the cytochrome P450 enzymes 2C9, 2D6, and 3A4. The LWRP algorithm itself incorporates elements from support vector machines (SVMs) and recursive partitioning, while circumventing the need for the linear or quadratic programming methods required in SVMs. We compare LWRP with a many-descriptor SVM model, using the same dataset as that described in the literature.(1) The line-walking method, using nine descriptors, predicted the validation set with about 84-90% accuracy, a success rate comparable to that of the SVM method. Furthermore, line-walking was able to find errors in the assignment of inhibitor values within the validation set for the 2C9 inhibitors. When these errors are corrected, the model predicts with an even higher level of accuracy. Although this method has been applied to P450 enzymes, it should be of general use in partitioning molecules on the basis of function.
描述了一种新方法,称为行线递归划分(LWRP),用于基于化学性质对不同结构进行划分,该方法仅使用分子的形状、极化率和电荷的九个描述符。我们使用了一个包含600多种化合物的训练集和一个包含100种化合物的验证集,用于细胞色素P450酶2C9、2D6和3A4。LWRP算法本身融合了支持向量机(SVM)和递归划分的元素,同时避免了SVM中所需的线性或二次规划方法。我们使用与文献中描述的相同数据集,将LWRP与多描述符SVM模型进行比较。(1) 使用九个描述符的行线方法对验证集的预测准确率约为84 - 90%,成功率与SVM方法相当。此外,行线方法能够在2C9抑制剂的验证集内发现抑制剂值分配中的错误。当这些错误得到纠正时,模型的预测准确率会更高。尽管该方法已应用于P450酶,但它在基于功能对分子进行划分方面应具有普遍用途。