Zhang Shuqin, Zhao Hongyu, Ng Michael K
IEEE/ACM Trans Comput Biol Bioinform. 2015 Sep-Oct;12(5):1146-60. doi: 10.1109/TCBB.2015.2396073.
Network has been a general tool for studying the complex interactions between different genes, proteins, and other small molecules. Module as a fundamental property of many biological networks has been widely studied and many computational methods have been proposed to identify the modules in an individual network. However, in many cases, a single network is insufficient for module analysis due to the noise in the data or the tuning of parameters when building the biological network. The availability of a large amount of biological networks makes network integration study possible. By integrating such networks, more informative modules for some specific disease can be derived from the networks constructed from different tissues, and consistent factors for different diseases can be inferred. In this paper, we have developed an effective method for module identification from multiple networks under different conditions. The problem is formulated as an optimization model, which combines the module identification in each individual network and alignment of the modules from different networks together. An approximation algorithm based on eigenvector computation is proposed. Our method outperforms the existing methods, especially when the underlying modules in multiple networks are different in simulation studies. We also applied our method to two groups of gene coexpression networks for humans, which include one for three different cancers, and one for three tissues from the morbidly obese patients. We identified 13 modules with three complete subgraphs, and 11 modules with two complete subgraphs, respectively. The modules were validated through Gene Ontology enrichment and KEGG pathway enrichment analysis. We also showed that the main functions of most modules for the corresponding disease have been addressed by other researchers, which may provide the theoretical basis for further studying the modules experimentally.
网络已成为研究不同基因、蛋白质和其他小分子之间复杂相互作用的通用工具。模块作为许多生物网络的基本属性已得到广泛研究,并且已经提出了许多计算方法来识别单个网络中的模块。然而,在许多情况下,由于数据中的噪声或构建生物网络时参数的调整,单个网络不足以进行模块分析。大量生物网络的可用性使得网络整合研究成为可能。通过整合这些网络,可以从不同组织构建的网络中导出针对某些特定疾病的更具信息性的模块,并推断出不同疾病的一致因素。在本文中,我们开发了一种在不同条件下从多个网络中识别模块的有效方法。该问题被表述为一个优化模型,它将每个单个网络中的模块识别与来自不同网络的模块对齐结合在一起。提出了一种基于特征向量计算的近似算法。我们的方法优于现有方法,特别是在模拟研究中多个网络中的潜在模块不同时。我们还将我们的方法应用于两组人类基因共表达网络,其中一组针对三种不同癌症,另一组针对病态肥胖患者的三种组织。我们分别识别出了具有三个完全子图的13个模块和具有两个完全子图的11个模块。通过基因本体富集和KEGG通路富集分析对这些模块进行了验证。我们还表明,其他研究人员已经探讨了大多数对应疾病模块的主要功能,这可能为进一步通过实验研究这些模块提供理论基础。