Computational Science Research Center, San Diego State University, San Diego, California, USA.
J Chem Inf Model. 2010 Oct 25;50(10):1759-71. doi: 10.1021/ci100200u.
Mutations that arise in HIV-1 protease after exposure to various HIV-1 protease inhibitors have proved to be a difficult aspect in the treatment of HIV. Mutations in the binding pocket of the protease can prevent the protease inhibitor from binding to the protein effectively. In the present study, the crystal structures of 68 HIV-1 proteases complexed with one of the nine FDA approved protease inhibitors from the Protein Data Bank (PDB) were analyzed by (a) identifying the mutational changes with the aid of a developed mutation map and (b) correlating the structure of the binding pockets with the complexed inhibitors. The mutations of each crystal structure were identified by comparing the amino acid sequence of each structure against the HIV-1 wild-type strain HXB2. These mutations were visually presented in the form of a mutation map to analyze mutation patterns corresponding to each protease inhibitor. The crystal structure mutation patterns of each inhibitor (in vitro) were compared against the mutation patterns observed in in vivo data. The in vitro mutation patterns were found to be representative of most of the major in vivo mutations. We then performed a data mining analysis of the binding pockets from each crystal structure in terms of their chemical descriptors to identify important structural features of the HIV-1 protease protein with respect to the binding conformation of the HIV-1 protease inhibitors. Data mining analysis is performed using several classification techniques: Random Forest (RF), linear discriminant analysis (LDA), and logistic regression (LR). We developed two hybrid models, RF-LDA and RF-LR. Random Forest is used as a feature selection proxy, reducing the descriptor space to a few of the most relevant descriptors determined by the classifier. These descriptors are then used to develop the subsequent LDA, LR, and hierarchical classification models. Clustering analysis of the binding pockets using the selected descriptors used to produce the optimal classification models reveals conformational similarities of the ligands in each cluster. This study provides important information in understanding the structural features of HIV-1 protease which cannot be studied by other existing in vivo genomic data sets.
HIV-1 蛋白酶中出现的突变,在 HIV 的治疗中已被证明是一个棘手的问题。蛋白酶结合口袋中的突变可以阻止蛋白酶抑制剂有效地与蛋白质结合。在本研究中,通过(a)利用开发的突变图识别突变变化,(b)将结合口袋的结构与结合的抑制剂相关联,对来自蛋白质数据库(PDB)的 9 种经美国食品和药物管理局批准的蛋白酶抑制剂之一的 68 种 HIV-1 蛋白酶的晶体结构进行了分析。每个晶体结构的突变都是通过将每个结构的氨基酸序列与 HIV-1 野生型 HXB2 进行比较来确定的。这些突变以突变图的形式呈现,以分析与每种蛋白酶抑制剂相对应的突变模式。每个抑制剂(体外)的晶体结构突变模式与体内数据观察到的突变模式进行了比较。发现体外突变模式代表了大多数主要的体内突变。然后,我们针对每个晶体结构的结合口袋,根据其化学描述符进行了数据挖掘分析,以确定 HIV-1 蛋白酶蛋白相对于 HIV-1 蛋白酶抑制剂的结合构象的重要结构特征。数据挖掘分析使用了几种分类技术:随机森林(RF)、线性判别分析(LDA)和逻辑回归(LR)。我们开发了两种混合模型,RF-LDA 和 RF-LR。随机森林用作特征选择代理,将描述符空间减少到由分类器确定的几个最相关的描述符。然后,这些描述符用于开发后续的 LDA、LR 和层次分类模型。使用所选描述符对结合口袋进行聚类分析,以产生最佳分类模型,揭示了每个簇中配体的构象相似性。这项研究提供了有关 HIV-1 蛋白酶结构特征的重要信息,这些信息无法通过其他现有的体内基因组数据集来研究。