College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China.
Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio 43210, USA.
J Theor Biol. 2019 Feb 7;462:329-346. doi: 10.1016/j.jtbi.2018.11.011. Epub 2018 Nov 16.
Research on protein-protein interactions (PPIs) not only helps to reveal the nature of life activities but also plays a driving role in understanding the mechanisms of disease activity and the development of effective drugs. The rapid development of machine learning provides new opportunities and challenges for understanding the mechanism of PPIs. It plays an important role in the field of proteomics research. In recent years, an increasing number of computational methods for predicting PPIs have been developed. This paper proposes a new method for predicting PPIs based on multi-information fusion. First, the pseudo-amino acid composition (PseAAC), auto-covariance (AC) and encoding based on grouped weight (EBGW) methods are used to extract the features of protein sequences, and the extracted three groups of feature vectors were fused. Secondly, the fused feature vectors are denoised by two-dimensional (2-D) wavelet denoising. Finally, the denoised feature vectors are input to the support vector machine (SVM) classifier to predict the PPIs. The ACC of PPIs of Helicobacter pylori (H. pylori) and Saccharomyces cerevisiae (S. cerevisiae) datasets were 95.97% and 95.55% by 5-fold cross-validation test and compared with other prediction methods. The experimental results show that the proposed multi-information fusion prediction method can effectively improve the prediction performance of PPIs. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/PPIs-WDSVM/.
蛋白质-蛋白质相互作用(PPIs)的研究不仅有助于揭示生命活动的本质,而且对理解疾病活动的机制和开发有效的药物具有推动作用。机器学习的快速发展为理解 PPIs 的机制提供了新的机会和挑战。它在蛋白质组学研究领域发挥着重要作用。近年来,已经开发出越来越多用于预测 PPIs 的计算方法。本文提出了一种基于多信息融合的预测 PPIs 的新方法。首先,使用伪氨基酸组成(PseAAC)、自协方差(AC)和基于分组权重的编码(EBGW)方法提取蛋白质序列的特征,并融合提取的三组特征向量。其次,通过二维(2-D)小波去噪对融合特征向量进行去噪。最后,将去噪后的特征向量输入支持向量机(SVM)分类器来预测 PPIs。通过 5 折交叉验证测试,幽门螺杆菌(H. pylori)和酿酒酵母(S. cerevisiae)数据集的 PPIs 的 ACC 分别为 95.97%和 95.55%,并与其他预测方法进行了比较。实验结果表明,所提出的多信息融合预测方法可以有效地提高 PPIs 的预测性能。源代码和所有数据集均可在 https://github.com/QUST-AIBBDRC/PPIs-WDSVM/ 上获得。