Niu Bing, Lu Lin, Liu Liang, Gu Tian Hong, Feng Kai-Yan, Lu Wen-Cong, Cai Yu-Dong
School of Materials Science and Engineering, Shanghai University, 149 Yan-Chang Road, Shanghai 200072, People's Republic of China.
J Comput Chem. 2009 Jan 15;30(1):33-9. doi: 10.1002/jcc.21024.
Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this article, we selected HIV-1 protease as the subject of the study. 299 oligopeptides were chosen for the training set, while the other 63 oligopeptides were taken as a test set. The peptides are represented by features constructed by AAIndex (Kawashima et al., Nucleic Acids Res 1999, 27, 368; Kawashima and Kanehisa, Nucleic Acids Res 2000, 28, 374). The mRMR method (Maximum Relevance, Minimum Redundancy; Ding and Peng, Proc Second IEEE Comput Syst Bioinformatics Conf 2003, 523; Peng et al., IEEE Trans Pattern Anal Mach Intell 2005, 27, 1226) combining with incremental feature selection (IFS) and feature forward search (FFS) are applied to find the two important cleavage sites and to select 364 important biochemistry features by jackknife test. Using KNN (K-nearest neighbors) to combine the selected features, the prediction model obtains high accuracy rate of 91.3% for Jackknife cross-validation test and 87.3% for independent-set test. It is expected that our feature selection scheme can be referred to as a useful assistant technique for finding effective inhibitors of HIV protease, especially for the scientists in this field.
了解HIV蛋白酶对多蛋白的切割位点将深化我们对其特异性的认识,而由此获得的信息对于设计特异性高效的HIV蛋白酶抑制剂很有用。最近,有几项研究通过应用多种分类器创建和组合方法来解决HIV-1蛋白酶特异性问题。如果能够找到一种准确、稳健且快速的方法来预测HIV蛋白酶在蛋白质中的切割位点,那么寻找HIV蛋白酶合适抑制剂的步伐将大大加快。在本文中,我们选择HIV-1蛋白酶作为研究对象。选择299个寡肽作为训练集,另外63个寡肽作为测试集。这些肽由AAIndex构建的特征表示(Kawashima等人,《核酸研究》1999年,27卷,368页;Kawashima和Kanehisa,《核酸研究》2000年,28卷,374页)。结合增量特征选择(IFS)和特征前向搜索(FFS)的mRMR方法(最大相关性,最小冗余性;Ding和Peng,《第二届IEEE计算系统生物信息学会议论文集》2003年,523页;Peng等人,《IEEE模式分析与机器智能汇刊》2005年,27卷,1226页)被用于找到两个重要的切割位点,并通过留一法测试选择364个重要的生物化学特征。使用KNN(K近邻)来组合所选特征,该预测模型在留一法交叉验证测试中的准确率高达91.3%,在独立集测试中的准确率为87.3%。预计我们的特征选择方案可作为一种有用的辅助技术,用于寻找有效的HIV蛋白酶抑制剂,特别是对于该领域的科学家而言。