Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden.
PLoS One. 2013 Jun 17;8(6):e66566. doi: 10.1371/journal.pone.0066566. Print 2013.
A unified proteochemometric (PCM) model for the prediction of the ability of drug-like chemicals to inhibit five major drug metabolizing CYP isoforms (i.e. CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4) was created and made publicly available under the Bioclipse Decision Support open source system at www.cyp450model.org. In regards to the proteochemometric modeling we represented the chemical compounds by molecular signature descriptors and the CYP-isoforms by alignment-independent description of composition and transition of amino acid properties of their protein primary sequences. The entire training dataset contained 63 391 interactions and the best PCM model was obtained using signature descriptors of height 1, 2 and 3 and inducing the model with a support vector machine. The model showed excellent predictive ability with internal AUC = 0.923 and an external AUC = 0.940, as evaluated on a large external dataset. The advantage of PCM models is their extensibility making it possible to extend our model for new CYP isoforms and polymorphic CYP forms. A key benefit of PCM is that all proteins are confined in one single model, which makes it generally more stable and predictive as compared with single target models. The inclusion of the model in Bioclipse Decision Support makes it possible to make virtual instantaneous predictions (∼100 ms per prediction) while interactively drawing or modifying chemical structures in the Bioclipse chemical structure editor.
我们创建了一个统一的药物化学计量(PCM)模型,用于预测类药性化学物质抑制五种主要药物代谢 CYP 同工酶(即 CYP1A2、CYP2C9、CYP2C19、CYP2D6 和 CYP3A4)的能力,并在 www.cyp450model.org 下的 Bioclipse 决策支持开源系统中公开提供。在进行药物化学计量建模时,我们使用分子特征描述符来表示化学化合物,并用 CYP-同工酶的蛋白质一级序列中氨基酸性质的组成和转变的无序列对齐描述来表示。整个训练数据集包含 63391 个相互作用,使用高度为 1、2 和 3 的特征描述符并使用支持向量机对模型进行诱导,获得了最佳的 PCM 模型。该模型在大型外部数据集上的内部 AUC=0.923 和外部 AUC=0.940 的评估中表现出优异的预测能力。PCM 模型的优势在于其可扩展性,使其能够为新的 CYP 同工酶和多态 CYP 形式扩展我们的模型。PCM 的一个关键优势是所有蛋白质都包含在一个单一的模型中,这使得它通常比单一目标模型更稳定和具有预测性。该模型包含在 Bioclipse 决策支持中,使得在 Bioclipse 化学结构编辑器中交互绘制或修改化学结构时能够实现虚拟即时预测(每个预测约 100 毫秒)。