Dong Jun, Horvath Steve
Department of Human Genetics and Department of Biostatistics, University of California, Los Angeles, CA 90095, USA.
BMC Syst Biol. 2007 Jun 4;1:24. doi: 10.1186/1752-0509-1-24.
Network concepts are increasingly used in biology and genetics. For example, the clustering coefficient has been used to understand network architecture; the connectivity (also known as degree) has been used to screen for cancer targets; and the topological overlap matrix has been used to define modules and to annotate genes. Dozens of potentially useful network concepts are known from graph theory.
Here we study network concepts in special types of networks, which we refer to as approximately factorizable networks. In these networks, the pairwise connection strength (adjacency) between 2 network nodes can be factored into node specific contributions, named node 'conformity'. The node conformity turns out to be highly related to the connectivity. To provide a formalism for relating network concepts to each other, we define three types of network concepts: fundamental-, conformity-based-, and approximate conformity-based concepts. Fundamental concepts include the standard definitions of connectivity, density, centralization, heterogeneity, clustering coefficient, and topological overlap. The approximate conformity-based analogs of fundamental network concepts have several theoretical advantages. First, they allow one to derive simple relationships between seemingly disparate networks concepts. For example, we derive simple relationships between the clustering coefficient, the heterogeneity, the density, the centralization, and the topological overlap. The second advantage of approximate conformity-based network concepts is that they allow one to show that fundamental network concepts can be approximated by simple functions of the connectivity in module networks.
Using protein-protein interaction, gene co-expression, and simulated data, we show that a) many networks comprised of module nodes are approximately factorizable and b) in these types of networks, simple relationships exist between seemingly disparate network concepts. Our results are implemented in freely available R software code, which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/ModuleConformity/ModuleNetworks.
网络概念在生物学和遗传学中的应用日益广泛。例如,聚类系数已被用于理解网络架构;连通性(也称为度)已被用于筛选癌症靶点;拓扑重叠矩阵已被用于定义模块和注释基因。从图论中已知数十种潜在有用的网络概念。
在此,我们研究特殊类型网络中的网络概念,我们将其称为近似可分解网络。在这些网络中,两个网络节点之间的成对连接强度(邻接性)可以分解为节点特定的贡献,称为节点“一致性”。结果表明,节点一致性与连通性高度相关。为了提供一种将网络概念相互关联的形式体系,我们定义了三种类型的网络概念:基本概念、基于一致性的概念和基于近似一致性的概念。基本概念包括连通性、密度、中心性、异质性、聚类系数和拓扑重叠的标准定义。基本网络概念的基于近似一致性的类似概念具有几个理论优势。首先,它们使人们能够推导出看似不同的网络概念之间的简单关系。例如,我们推导出了聚类系数、异质性、密度、中心性和拓扑重叠之间的简单关系。基于近似一致性的网络概念的第二个优势是,它们使人们能够表明基本网络概念可以由模块网络中连通性的简单函数近似表示。
使用蛋白质 - 蛋白质相互作用、基因共表达和模拟数据,我们表明:a)许多由模块节点组成的网络近似可分解;b)在这些类型的网络中,看似不同的网络概念之间存在简单关系。我们的结果已在免费提供的R软件代码中实现,该代码可从以下网页下载:http://www.genetics.ucla.edu/labs/horvath/ModuleConformity/ModuleNetworks。