Huang Qiaoying, You Zhuhong, Zhang Xiaofeng, Zhou Yong
Shenzhen Graduate School, Harbin Institute of Technology, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China.
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
Int J Mol Sci. 2015 May 13;16(5):10855-69. doi: 10.3390/ijms160510855.
With the completion of the Human Genome Project, bioscience has entered into the era of the genome and proteome. Therefore, protein-protein interactions (PPIs) research is becoming more and more important. Life activities and the protein-protein interactions are inseparable, such as DNA synthesis, gene transcription activation, protein translation, etc. Though many methods based on biological experiments and machine learning have been proposed, they all spent a long time to learn and obtained an imprecise accuracy. How to efficiently and accurately predict PPIs is still a big challenge. To take up such a challenge, we developed a new predictor by incorporating the reduced amino acid alphabet (RAAA) information into the general form of pseudo-amino acid composition (PseAAC) and with the weighted sparse representation-based classification (WSRC). The remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimensionality disaster or overfitting problem in statistical prediction. Additionally, experiments have proven that our method achieved good performance in both a low- and high-dimensional feature space. Among all of the experiments performed on the PPIs data of Saccharomyces cerevisiae, the best one achieved 90.91% accuracy, 94.17% sensitivity, 87.22% precision and a 83.43% Matthews correlation coefficient (MCC) value. In order to evaluate the prediction ability of our method, extensive experiments are performed to compare with the state-of-the-art technique, support vector machine (SVM). The achieved results show that the proposed approach is very promising for predicting PPIs, and it can be a helpful supplement for PPIs prediction.
随着人类基因组计划的完成,生物科学已进入基因组和蛋白质组时代。因此,蛋白质-蛋白质相互作用(PPI)研究变得越来越重要。生命活动与蛋白质-蛋白质相互作用密不可分,如DNA合成、基因转录激活、蛋白质翻译等。尽管已经提出了许多基于生物学实验和机器学习的方法,但它们都花费了很长时间来学习,并且准确率不高。如何高效、准确地预测蛋白质-蛋白质相互作用仍然是一个巨大的挑战。为了应对这一挑战,我们通过将简化氨基酸字母表(RAAA)信息纳入伪氨基酸组成(PseAAC)的一般形式,并结合基于加权稀疏表示的分类(WSRC),开发了一种新的预测器。引入简化氨基酸字母表的显著优点是能够避免统计预测中臭名昭著的维度灾难或过拟合问题。此外,实验证明我们的方法在低维和高维特征空间中都取得了良好的性能。在对酿酒酵母的蛋白质-蛋白质相互作用数据进行的所有实验中,最佳结果的准确率达到90.91%,灵敏度达到94.17%,精确率达到87.22%,马修斯相关系数(MCC)值达到83.43%。为了评估我们方法的预测能力,进行了广泛的实验以与最先进的技术支持向量机(SVM)进行比较。所取得的结果表明,所提出的方法在预测蛋白质-蛋白质相互作用方面非常有前景,并且可以作为蛋白质-蛋白质相互作用预测的有益补充。