Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan.
PLoS One. 2013 Jun 11;8(6):e65265. doi: 10.1371/journal.pone.0065265. Print 2013.
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes.
由于许多蛋白质通过与其他蛋白质相互作用并形成蛋白质复合物来表达其功能活性,因此识别形成复合物的蛋白质组非常有用。为此,已经开发了许多从蛋白质-蛋白质相互作用预测蛋白质复合物的方法,例如 MCL、MCODE、RNSC、PCP、RRW 和 NWE。这些方法仅处理大小超过三个的复合物,因为这些方法通常基于子图的某些密度。然而,根据几个已知复合物的综合数据库,由两个不同蛋白质组成的异源二聚体蛋白复合物占据了很大一部分。在本文中,我们提出了几种从蛋白质-蛋白质相互作用数据中进行特征空间映射的方法,其中每个相互作用都根据可靠性进行加权。此外,我们利用关于蛋白质结构域的先验知识来开发特征空间映射、结构域组成核及其与我们提出的特征的组合核。我们进行了十折交叉验证计算实验。这些结果表明,我们提出的核显著优于基于朴素贝叶斯的方法,这是预测异源二聚体蛋白复合物的最佳现有方法。