Protein Information Technology Group, Eötvös University, Budapest, Hungary.
PLoS One. 2013;8(1):e54204. doi: 10.1371/journal.pone.0054204. Epub 2013 Jan 29.
Biological network data, such as metabolic-, signaling- or physical interaction graphs of proteins are increasingly available in public repositories for important species. Tools for the quantitative analysis of these networks are being developed today. Protein network-based drug target identification methods usually return protein hubs with large degrees in the networks as potentially important targets. Some known, important protein targets, however, are not hubs at all, and perturbing protein hubs in these networks may have several unwanted physiological effects, due to their interaction with numerous partners. Here, we show a novel method applicable in networks with directed edges (such as metabolic networks) that compensates for the low degree (non-hub) vertices in the network, and identifies important nodes, regardless of their hub properties. Our method computes the PageRank for the nodes of the network, and divides the PageRank by the in-degree (i.e., the number of incoming edges) of the node. This quotient is the same in all nodes in an undirected graph (even for large- and low-degree nodes, that is, for hubs and non-hubs as well), but may differ significantly from node to node in directed graphs. We suggest to assign importance to non-hub nodes with large PageRank/in-degree quotient. Consequently, our method gives high scores to nodes with large PageRank, relative to their degrees: therefore non-hub important nodes can easily be identified in large networks. We demonstrate that these relatively high PageRank scores have biological relevance: the method correctly finds numerous already validated drug targets in distinct organisms (Mycobacterium tuberculosis, Plasmodium falciparum and MRSA Staphylococcus aureus), and consequently, it may suggest new possible protein targets as well. Additionally, our scoring method was not chosen arbitrarily: its value for all nodes of all undirected graphs is constant; therefore its high value captures importance in the directed edge structure of the graph.
生物网络数据,如蛋白质的代谢、信号或物理相互作用图,在重要物种的公共存储库中越来越多地可用。今天正在开发用于定量分析这些网络的工具。基于蛋白质网络的药物靶标识别方法通常会返回网络中具有大度数的蛋白质枢纽作为潜在的重要靶标。然而,一些已知的重要蛋白质靶标根本不是枢纽,并且由于它们与众多伙伴相互作用,在这些网络中干扰蛋白质枢纽可能会产生一些不必要的生理影响。在这里,我们展示了一种适用于具有有向边(如代谢网络)的网络的新方法,该方法补偿了网络中低度数(非枢纽)顶点,并确定了重要节点,而不管它们的枢纽特性如何。我们的方法为网络节点计算 PageRank,并将 PageRank 除以节点的入度(即传入边的数量)。在无向图中,所有节点的这个商都是相同的(即使是大度数和小度数节点,即枢纽和非枢纽节点也是如此),但在有向图中,节点之间可能有很大的差异。我们建议为 PageRank/入度商较大的非枢纽节点分配重要性。因此,我们的方法为具有较大 PageRank 的节点赋予较高的分数,相对于它们的度数:因此,在大型网络中可以轻松识别非枢纽重要节点。我们证明这些相对较高的 PageRank 分数具有生物学相关性:该方法正确地在不同生物体(结核分枝杆菌、疟原虫和耐甲氧西林金黄色葡萄球菌)中找到了许多已验证的药物靶标,因此,它也可能提示新的可能的蛋白质靶标。此外,我们的评分方法不是任意选择的:它对所有无向图的所有节点的值都是恒定的;因此,它的高值捕获了图的有向边结构中的重要性。