College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin 150001, China.
Genes (Basel). 2021 Jan 12;12(1):87. doi: 10.3390/genes12010087.
Among biological networks, co-expression networks have been widely studied. One of the most commonly used pipelines for the construction of co-expression networks is weighted gene co-expression network analysis (WGCNA), which can identify highly co-expressed clusters of genes (modules). WGCNA identifies gene modules using hierarchical clustering. The major drawback of hierarchical clustering is that once two objects are clustered together, it cannot be reversed; thus, re-adjustment of the unbefitting decision is impossible. In this paper, we calculate the similarity matrix with the distance correlation for WGCNA to construct a gene co-expression network, and present a new approach called the k-module algorithm to improve the WGCNA clustering results. This method can assign all genes to the module with the highest mean connectivity with these genes. This algorithm re-adjusts the results of hierarchical clustering while retaining the advantages of the dynamic tree cut method. The validity of the algorithm is verified using six datasets from microarray and RNA-seq data. The k-module algorithm has fewer iterations, which leads to lower complexity. We verify that the gene modules obtained by the k-module algorithm have high enrichment scores and strong stability. Our method improves upon hierarchical clustering, and can be applied to general clustering algorithms based on the similarity matrix, not limited to gene co-expression network analysis.
在生物网络中,共表达网络得到了广泛的研究。构建共表达网络最常用的方法之一是加权基因共表达网络分析(WGCNA),它可以识别高度共表达的基因簇(模块)。WGCNA 使用层次聚类来识别基因模块。层次聚类的主要缺点是,一旦两个对象被聚类在一起,就无法逆转;因此,不可能对不恰当的决策进行重新调整。在本文中,我们使用距离相关系数计算相似性矩阵来构建基因共表达网络,并提出了一种新的方法,称为 k-模块算法,以改进 WGCNA 的聚类结果。该方法可以将所有基因分配到与这些基因具有最高平均连接性的模块中。该算法在保留动态树切分方法优点的同时,重新调整了层次聚类的结果。该算法的有效性已通过微阵列和 RNA-seq 数据的六个数据集得到验证。k-模块算法的迭代次数较少,因此复杂度较低。我们验证了 k-模块算法得到的基因模块具有较高的富集分数和较强的稳定性。我们的方法改进了层次聚类,并且可以应用于基于相似性矩阵的一般聚类算法,而不限于基因共表达网络分析。