Zahiri Javad, Mohammad-Noori Morteza, Ebrahimpour Reza, Saadat Samaneh, Bozorgmehr Joseph H, Goldberg Tatyana, Masoudi-Nejad Ali
Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran; Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran.
School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran.
Genomics. 2014 Dec;104(6 Pt B):496-503. doi: 10.1016/j.ygeno.2014.10.006. Epub 2014 Oct 16.
Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse.
The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems.
蛋白质-蛋白质相互作用(PPI)检测是功能基因组学和系统生物学的核心目标之一。关于PPI性质的知识有助于填补序列信息与功能注释之间日益扩大的差距。尽管实验方法已经产生了有价值的PPI数据,但它们也存在显著局限性。计算PPI预测方法已引起了极大关注。尽管付出了巨大努力,但在诸如人类等复杂多细胞生物中,PPI预测仍处于起步阶段。在此,我们提出了一种新颖的集成学习方法LocFuse,它在人类PPI预测中很有用。该方法使用八种不同的基因组和蛋白质组特征以及四种不同类型的分类器。发现这种分类器选择方法的预测性能比迄今使用的方法要好得多。这证实了PPI预测问题的复杂性以及使用生物信息进行分类器融合的必要性。LocFuse可在以下网址获取:http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse。
结果表明,如果我们根据蛋白质的细胞定位来划分蛋白质组空间,那么一些分类器在PPI预测中的效用可以得到提高。因此,为了预测任何给定蛋白质对之间的相互作用,我们可以根据细胞定位信息选择最准确的分类器。基于这些结果,我们可以说,不同定位的蛋白质之间,不同特征对PPI预测的重要性各不相同;然而一般来说,我们从位置特异性评分矩阵(PSSM)中提取的新特征是最重要的,并且随机森林(RF)分类器在大多数情况下表现最佳。LocFuse是通过用户友好的图形界面开发的,可免费用于Linux、Mac OSX和MS Windows操作系统。