使用图let度分布进行生物网络比较。

Biological network comparison using graphlet degree distribution.

作者信息

Przulj Natasa

机构信息

Computer Science Department, University of California, Irvine, CA 92697-3425, USA.

出版信息

Bioinformatics. 2007 Jan 15;23(2):e177-83. doi: 10.1093/bioinformatics/btl301.

DOI:10.1093/bioinformatics/btl301

PMID:17237089

Abstract

MOTIVATION

Analogous to biological sequence comparison, comparing cellular networks is an important problem that could provide insight into biological understanding and therapeutics. For technical reasons, comparing large networks is computationally infeasible, and thus heuristics, such as the degree distribution, clustering coefficient, diameter, and relative graphlet frequency distribution have been sought. It is easy to demonstrate that two networks are different by simply showing a short list of properties in which they differ. It is much harder to show that two networks are similar, as it requires demonstrating their similarity in all of their exponentially many properties. Clearly, it is computationally prohibitive to analyze all network properties, but the larger the number of constraints we impose in determining network similarity, the more likely it is that the networks will truly be similar.

RESULTS

We introduce a new systematic measure of a network's local structure that imposes a large number of similarity constraints on networks being compared. In particular, we generalize the degree distribution, which measures the number of nodes 'touching' k edges, into distributions measuring the number of nodes 'touching' k graphlets, where graphlets are small connected non-isomorphic subgraphs of a large network. Our new measure of network local structure consists of 73 graphlet degree distributions of graphlets with 2-5 nodes, but it is easily extendible to a greater number of constraints (i.e. graphlets), if necessary, and the extensions are limited only by the available CPU. Furthermore, we show a way to combine the 73 graphlet degree distributions into a network 'agreement' measure which is a number between 0 and 1, where 1 means that networks have identical distributions and 0 means that they are far apart. Based on this new network agreement measure, we show that almost all of the 14 eukaryotic PPI networks, including human, resulting from various high-throughput experimental techniques, as well as from curated databases, are better modeled by geometric random graphs than by Erdös-Rény, random scale-free, or Barabási-Albert scale-free networks.

AVAILABILITY

Software executables are available upon request.

摘要

动机

类似于生物序列比较，比较细胞网络是一个重要问题，可为生物学理解和治疗学提供见解。由于技术原因，比较大型网络在计算上是不可行的，因此人们一直在寻找启发式方法，例如度分布、聚类系数、直径和相对图元频率分布。通过简单列出两个网络不同的一小部分属性，很容易证明它们是不同的。而要证明两个网络相似则困难得多，因为这需要证明它们在指数级数量的所有属性上都相似。显然，分析所有网络属性在计算上是禁止的，但我们在确定网络相似性时施加的约束越多，网络真正相似的可能性就越大。

结果

我们引入了一种新的网络局部结构系统度量，该度量对被比较的网络施加了大量相似性约束。特别是，我们将度分布（用于测量“接触”k条边的节点数量）推广为测量“接触”k个图元的节点数量的分布，其中图元是大型网络中连通的非同构小子图。我们新的网络局部结构度量由73个2至5个节点的图元的图元度分布组成，但如有必要，它很容易扩展到更多约束（即图元），并且扩展仅受可用CPU的限制。此外，我们展示了一种将73个图元度分布组合成网络“一致性”度量的方法，该度量是一个介于0和1之间的数字，其中1表示网络具有相同的分布，0表示它们相差很远。基于这种新的网络一致性度量，我们表明，几乎所有14个真核生物蛋白质-蛋白质相互作用（PPI）网络，包括来自各种高通量实验技术以及经过整理的数据库的人类PPI网络，用几何随机图建模比用厄多斯-雷尼随机图、随机无标度图或巴拉巴西-阿尔伯特无标度网络建模更好。