An Ji-Yong, Zhang Lei, Zhou Yong, Zhao Yu-Jun, Wang Da-Fu
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 21116, Jiangsu, China.
J Cheminform. 2017 Aug 18;9(1):47. doi: 10.1186/s13321-017-0233-z.
Self-interactions Proteins (SIPs) is important for their biological activity owing to the inherent interaction amongst their secondary structures or domains. However, due to the limitations of experimental Self-interactions detection, one major challenge in the study of prediction SIPs is how to exploit computational approaches for SIPs detection based on evolutionary information contained protein sequence. In the work, we presented a novel computational approach named WELM-LAG, which combined the Weighed-Extreme Learning Machine (WELM) classifier with Local Average Group (LAG) to predict SIPs based on protein sequence. The major improvement of our method lies in presenting an effective feature extraction method used to represent candidate Self-interactions proteins by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM); and then employing a reliable and robust WELM classifier to carry out classification. In addition, the Principal Component Analysis (PCA) approach is used to reduce the impact of noise. The WELM-LAG method gave very high average accuracies of 92.94 and 96.74% on yeast and human datasets, respectively. Meanwhile, we compared it with the state-of-the-art support vector machine (SVM) classifier and other existing methods on human and yeast datasets, respectively. Comparative results indicated that our approach is very promising and may provide a cost-effective alternative for predicting SIPs. In addition, we developed a freely available web server called WELM-LAG-SIPs to predict SIPs. The web server is available at http://219.219.62.123:8888/WELMLAG/ .
自相互作用蛋白(SIPs)因其二级结构或结构域之间的内在相互作用而对其生物学活性至关重要。然而,由于实验性自相互作用检测的局限性,预测SIPs研究中的一个主要挑战是如何基于蛋白质序列中包含的进化信息开发用于SIPs检测的计算方法。在这项工作中,我们提出了一种名为WELM-LAG的新型计算方法,该方法将加权极限学习机(WELM)分类器与局部平均组(LAG)相结合,以基于蛋白质序列预测SIPs。我们方法的主要改进在于提出了一种有效的特征提取方法,通过探索嵌入在PSI-BLAST构建的位置特异性评分矩阵(PSSM)中的进化信息来表示候选自相互作用蛋白;然后采用可靠且稳健的WELM分类器进行分类。此外,主成分分析(PCA)方法用于减少噪声的影响。WELM-LAG方法在酵母和人类数据集上分别给出了92.94%和96.74%的非常高的平均准确率。同时,我们分别在人类和酵母数据集上将其与最先进的支持向量机(SVM)分类器和其他现有方法进行了比较。比较结果表明我们的方法非常有前景,可能为预测SIPs提供一种经济高效的替代方案。此外,我们开发了一个名为WELM-LAG-SIPs的免费网络服务器来预测SIPs。该网络服务器可在http://219.219.62.123:8888/WELMLAG/ 获得。