Ma Xin, Sun Xiao
Golden Audit College, Nanjing Audit University, Nanjing 210029, China.
State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China.
J Theor Biol. 2014 Nov 7;360:59-66. doi: 10.1016/j.jtbi.2014.06.037. Epub 2014 Jul 8.
We develop a computational and statistical approach (ATPBR) for predicting ATP-binding residues in proteins from amino acid sequences by using random forests with a novel hybrid feature. The hybrid feature incorporates a new feature called PSSMPP, the predicted secondary structure and orthogonal binary vectors. The mRMR-IFS feature selection method is utilized to construct the best prediction model. At last, ATPBR achieves significantly improved performance over existing methods, with 87.53% accuracy and a Matthew׳s correlation coefficient of 0.554. In addition, our further analysis demonstrates that PSSMPP distinguishes more effectively between ATP-binding and non-binding residues. Besides, the optimal features selected by the mRMR-IFS method improve the prediction performance and may provide useful insights for revealing the mechanisms of ATP and proteins interactions.
我们开发了一种计算和统计方法(ATPBR),通过使用具有新型混合特征的随机森林,从氨基酸序列预测蛋白质中的ATP结合残基。该混合特征包含一种名为PSSMPP的新特征、预测的二级结构和正交二元向量。利用mRMR-IFS特征选择方法构建最佳预测模型。最后,ATPBR相对于现有方法实现了显著提高的性能,准确率为87.53%,马修斯相关系数为0.554。此外,我们的进一步分析表明,PSSMPP能更有效地区分ATP结合残基和非结合残基。此外,mRMR-IFS方法选择的最佳特征提高了预测性能,并可能为揭示ATP与蛋白质相互作用的机制提供有用的见解。