Computer Science and Engineering, Nanyang Technological University, Singapore.
Data Analytics Department, Institute for Infocomm Research, Singapore.
Methods. 2017 Dec 1;131:83-92. doi: 10.1016/j.ymeth.2017.06.036. Epub 2017 Jul 8.
Protein-protein interaction (PPI) networks play an important role in studying the functional roles of proteins, including their association with diseases. However, protein interaction networks are not sufficient without the support of additional biological knowledge for proteins such as their molecular functions and biological processes. To complement and enrich PPI networks, we propose to exploit biological properties of individual proteins. More specifically, we integrate keywords describing protein properties into the PPI network, and construct a novel PPI-Keywords (PPIK) network consisting of both proteins and keywords as two different types of nodes. As disease proteins tend to have a similar topological characteristics on the PPIK network, we further propose to represent proteins with metagraphs. Different from a traditional network motif or subgraph, a metagraph can capture a particular topological arrangement involving the interactions/associations between both proteins and keywords. Based on the novel metagraph representations for proteins, we further build classifiers for disease protein classification through supervised learning. Our experiments on three different PPI databases demonstrate that the proposed method consistently improves disease protein prediction across various classifiers, by 15.3% in AUC on average. It outperforms the baselines including the diffusion-based methods (e.g., RWR) and the module-based methods by 13.8-32.9% for overall disease protein prediction. For predicting breast cancer genes, it outperforms RWR, PRINCE and the module-based baselines by 6.6-14.2%. Finally, our predictions also turn out to have better correlations with literature findings from PubMed.
蛋白质-蛋白质相互作用(PPI)网络在研究蛋白质的功能角色方面起着重要作用,包括它们与疾病的关联。然而,如果没有蛋白质的其他生物知识(如它们的分子功能和生物过程)的支持,蛋白质相互作用网络是不充分的。为了补充和丰富 PPI 网络,我们建议利用单个蛋白质的生物特性。更具体地说,我们将描述蛋白质特性的关键字集成到 PPI 网络中,并构建一个由蛋白质和关键字作为两种不同类型节点组成的新型 PPI-关键字(PPIK)网络。由于疾病蛋白在 PPIK 网络上往往具有相似的拓扑特征,我们进一步提出用超图来表示蛋白。与传统的网络模式或子图不同,超图可以捕获涉及蛋白质和关键字之间相互作用/关联的特定拓扑排列。基于蛋白质的新超图表示,我们通过监督学习进一步构建用于疾病蛋白分类的分类器。我们在三个不同的 PPI 数据库上的实验表明,该方法通过平均提高 15.3%的 AUC,一致地提高了各种分类器的疾病蛋白预测能力。它比基于扩散的方法(例如 RWR)和基于模块的方法的基线提高了 13.8-32.9%,用于整体疾病蛋白预测。对于预测乳腺癌基因,它比 RWR、PRINCE 和基于模块的基线高出 6.6-14.2%。最后,我们的预测与 PubMed 中的文献发现也具有更好的相关性。