Pan Jie, Wang Shiwei, Yu Changqing, Li Liping, You Zhuhong, Sun Yanmei
Key Laboratory of Resources Biology and Biotechnology in Western China, Ministry of Education, College of Life Science, Northwest University, Xi'an 710069, China.
School of Information Engineering, Xijing University, Xi'an 710123, China.
Biology (Basel). 2022 May 19;11(5):775. doi: 10.3390/biology11050775.
Protein-protein interactions (PPIs) are crucial for understanding the cellular processes, including signal cascade, DNA transcription, metabolic cycles, and repair. In the past decade, a multitude of high-throughput methods have been introduced to detect PPIs. However, these techniques are time-consuming, laborious, and always suffer from high false negative rates. Therefore, there is a great need of new computational methods as a supplemental tool for PPIs prediction. In this article, we present a novel sequence-based model to predict PPIs that combines Discrete Hilbert transform (DHT) and Rotation Forest (RoF). This method contains three stages: firstly, the Position-Specific Scoring Matrices (PSSM) was adopted to transform the amino acid sequence into a PSSM matrix, which can contain rich information about protein evolution. Then, the 400-dimensional DHT descriptor was constructed for each protein pair. Finally, these feature descriptors were fed to the RoF classifier for identifying the potential PPI class. When exploring the proposed model on the , , and PPIs datasets, it yielded excellent prediction accuracies of 91.93, 96.35, and 94.24%, respectively. In addition, we also conducted numerous experiments on cross-species PPIs datasets, and the predictive capacity of our method is also very excellent. To further access the prediction ability of the proposed approach, we present the comparison of RoF with four powerful classifiers, including Support Vector Machine (SVM), Random Forest (RF), K-nearest Neighbor (KNN), and AdaBoost. We also compared it with some existing superiority works. These comprehensive experimental results further confirm the excellent and feasibility of the proposed approach. In future work, we hope it can be a supplemental tool for the proteomics analysis.
蛋白质-蛋白质相互作用(PPIs)对于理解细胞过程至关重要,这些过程包括信号级联、DNA转录、代谢循环和修复。在过去十年中,已经引入了多种高通量方法来检测PPIs。然而,这些技术耗时、费力,并且总是存在高假阴性率。因此,迫切需要新的计算方法作为PPIs预测的补充工具。在本文中,我们提出了一种基于序列的新型模型来预测PPIs,该模型结合了离散希尔伯特变换(DHT)和旋转森林(RoF)。该方法包括三个阶段:首先,采用位置特异性评分矩阵(PSSM)将氨基酸序列转换为PSSM矩阵,该矩阵可以包含有关蛋白质进化的丰富信息。然后,为每个蛋白质对构建400维DHT描述符。最后,将这些特征描述符输入到RoF分类器中以识别潜在的PPI类别。当在、和PPIs数据集上探索所提出的模型时,它分别产生了91.93%、96.35%和94.24%的优异预测准确率。此外,我们还对跨物种PPIs数据集进行了大量实验,我们方法的预测能力也非常出色。为了进一步评估所提出方法的预测能力,我们将RoF与四个强大的分类器进行了比较,包括支持向量机(SVM)、随机森林(RF)、K近邻(KNN)和AdaBoost。我们还将其与一些现有的优势工作进行了比较。这些全面的实验结果进一步证实了所提出方法的优异性能和可行性。在未来的工作中,我们希望它能成为蛋白质组学分析的补充工具。