School of Information Science and Technology, Northeast Normal University, Changchun, 130024 Jilin, China.
Graduate School, Northeast Normal University, Changchun 130024, Jilin, China.
Dis Markers. 2022 Oct 4;2022:5892627. doi: 10.1155/2022/5892627. eCollection 2022.
Prediction of protein-protein interaction (PPI) sites is one of the most perplexing problems in drug discovery and computational biology. Although significant progress has been made by combining different machine learning techniques with a variety of distinct characteristics, the problem still remains unresolved. In this study, a technique for PPI sites is presented using a random forest (RF) algorithm followed by the minimum redundancy maximal relevance (mRMR) approach, and the method of incremental feature selection (IFS). Physicochemical properties of proteins and the features of the residual disorder, sequence conservation, secondary structure, and solvent accessibility are incorporated. Five 3D structural characteristics are also used to predict PPI sites. Analysis of features shows that 3D structural features such as relative solvent-accessible surface area (RASA) and surface curvature (SC) help in the prediction of PPI sites. Results show that the performance of the proposed predictor is superior to several other state-of-the-art predictors, whose average prediction accuracy is 81.44%, sensitivity is 82.17%, and specificity is 80.71%, respectively. The proposed predictor is expected to become a helpful tool for finding PPI sites, and the feature analysis presented in this study will give useful insights into protein interaction mechanisms.
蛋白质-蛋白质相互作用(PPI)位点的预测是药物发现和计算生物学中最棘手的问题之一。尽管通过将不同的机器学习技术与各种不同的特征相结合已经取得了重大进展,但该问题仍然没有得到解决。在这项研究中,提出了一种使用随机森林(RF)算法结合最小冗余最大相关性(mRMR)方法和增量特征选择(IFS)方法的 PPI 位点预测技术。该方法结合了蛋白质的物理化学性质和残差无序、序列保守性、二级结构和溶剂可及性的特征。还使用了五个 3D 结构特征来预测 PPI 位点。特征分析表明,相对溶剂可及表面积(RASA)和表面曲率(SC)等 3D 结构特征有助于预测 PPI 位点。结果表明,所提出的预测器的性能优于其他几种最先进的预测器,其平均预测精度为 81.44%,灵敏度为 82.17%,特异性为 80.71%。预计该预测器将成为寻找 PPI 位点的有用工具,本研究中提出的特征分析将为蛋白质相互作用机制提供有用的见解。