School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China.
J Comput Chem. 2013 Apr 30;34(11):974-85. doi: 10.1002/jcc.23219. Epub 2013 Jan 3.
Understanding the interactions between proteins and ligands is critical for protein function annotations and drug discovery. We report a new sequence-based template-free predictor (TargetATPsite) to identify the Adenosine-5'-triphosphate (ATP) binding sites with machine-learning approaches. Two steps are implemented in TargetATPsite: binding residues and pockets predictions, respectively. To predict the binding residues, a novel image sparse representation technique is proposed to encode residue evolution information treated as the input features. An ensemble classifier constructed based on support vector machines (SVM) from multiple random under-samplings is used as the prediction model, which is effective for dealing with imbalance phenomenon between the positive and negative training samples. Compared with the existing ATP-specific sequence-based predictors, TargetATPsite is featured by the second step of possessing the capability of further identifying the binding pockets from the predicted binding residues through a spatial clustering algorithm. Experimental results on three benchmark datasets demonstrate the efficacy of TargetATPsite.
了解蛋白质和配体之间的相互作用对于蛋白质功能注释和药物发现至关重要。我们报告了一种新的基于序列的无模板预测器(TargetATPsite),该预测器采用机器学习方法来识别腺苷-5'-三磷酸(ATP)结合位点。TargetATPsite 包括两个步骤:分别是结合残基和口袋预测。为了预测结合残基,我们提出了一种新的图像稀疏表示技术,将残基进化信息作为输入特征进行编码。基于支持向量机(SVM)的集成分类器从多个随机欠采样中构建,用作预测模型,这对于处理正负训练样本之间的不平衡现象非常有效。与现有的基于 ATP 的序列预测器相比,TargetATPsite 的特点在于第二步具有通过空间聚类算法从预测的结合残基中进一步识别结合口袋的能力。在三个基准数据集上的实验结果证明了 TargetATPsite 的有效性。