Zhang Bin, Horvath Steve
Department of Human Genetics, University of California at Los Angeles, USA.
Stat Appl Genet Mol Biol. 2005;4:Article17. doi: 10.2202/1544-6115.1128. Epub 2005 Aug 12.
Gene co-expression networks are increasingly used to explore the system-level functionality of genes. The network construction is conceptually straightforward: nodes represent genes and nodes are connected if the corresponding genes are significantly co-expressed across appropriately chosen tissue samples. In reality, it is tricky to define the connections between the nodes in such networks. An important question is whether it is biologically meaningful to encode gene co-expression using binary information (connected=1, unconnected=0). We describe a general framework for ;soft' thresholding that assigns a connection weight to each gene pair. This leads us to define the notion of a weighted gene co-expression network. For soft thresholding we propose several adjacency functions that convert the co-expression measure to a connection weight. For determining the parameters of the adjacency function, we propose a biologically motivated criterion (referred to as the scale-free topology criterion). We generalize the following important network concepts to the case of weighted networks. First, we introduce several node connectivity measures and provide empirical evidence that they can be important for predicting the biological significance of a gene. Second, we provide theoretical and empirical evidence that the ;weighted' topological overlap measure (used to define gene modules) leads to more cohesive modules than its ;unweighted' counterpart. Third, we generalize the clustering coefficient to weighted networks. Unlike the unweighted clustering coefficient, the weighted clustering coefficient is not inversely related to the connectivity. We provide a model that shows how an inverse relationship between clustering coefficient and connectivity arises from hard thresholding. We apply our methods to simulated data, a cancer microarray data set, and a yeast microarray data set.
基因共表达网络越来越多地用于探索基因的系统级功能。网络构建在概念上很简单:节点代表基因,如果相应的基因在适当选择的组织样本中显著共表达,则节点相连。实际上,定义此类网络中节点之间的连接很棘手。一个重要的问题是,使用二进制信息(连接 = 1,未连接 = 0)对基因共表达进行编码在生物学上是否有意义。我们描述了一种用于“软”阈值化的通用框架,该框架为每个基因对分配一个连接权重。这使我们定义了加权基因共表达网络的概念。对于软阈值化,我们提出了几种邻接函数,将共表达度量转换为连接权重。为了确定邻接函数的参数,我们提出了一个基于生物学动机的标准(称为无标度拓扑标准)。我们将以下重要的网络概念推广到加权网络的情况。首先,我们引入了几种节点连通性度量,并提供了经验证据,表明它们对于预测基因的生物学意义可能很重要。其次,我们提供了理论和经验证据,表明“加权”拓扑重叠度量(用于定义基因模块)比其“未加权”对应度量能产生更具凝聚力的模块。第三,我们将聚类系数推广到加权网络。与未加权聚类系数不同,加权聚类系数与连通性并非呈反比关系。我们提供了一个模型,展示了聚类系数与连通性之间的反比关系是如何由硬阈值化产生的。我们将我们的方法应用于模拟数据、一个癌症微阵列数据集和一个酵母微阵列数据集。