Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China.
PLoS One. 2012;7(8):e43927. doi: 10.1371/journal.pone.0043927. Epub 2012 Aug 28.
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction.
蛋白质-蛋白质相互作用(PPI)位点的预测是计算生物学中最具挑战性的问题之一。尽管采用各种具有众多特征的机器学习方法已经取得了很大的进展,但这个问题仍然远未得到解决。在这项研究中,我们开发了一种基于随机森林(RF)算法的新型预测器,该算法采用最小冗余最大相关性(mRMR)方法和增量特征选择(IFS)。我们结合了物理化学/生化特性、序列保守性、残差无序、二级结构和溶剂可及性的特征。我们还包括五个 3D 结构特征来预测蛋白质-蛋白质相互作用位点,整体准确率为 0.672997,MCC 为 0.347977。特征分析表明,3D 结构特征(如深度指数(DPX)和表面曲率(SC))对蛋白质-蛋白质相互作用位点的预测贡献最大。通过特定于位点的特征分析还表明,来自 PPI 位点的单个残基的特征对确定蛋白质-蛋白质相互作用位点的贡献最大。预计我们的预测方法将成为识别 PPI 位点的有用工具,本文描述的特征分析将为相互作用机制提供有用的见解。