Liu Yusong, Ye Xiufen, Yu Christina Y, Shao Wei, Hou Jie, Feng Weixing, Zhang Jie, Huang Kun
Collage of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, 150001, Heilongjiang, China.
Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
BMC Bioinformatics. 2021 Oct 25;22(Suppl 4):111. doi: 10.1186/s12859-021-03964-5.
Gene co-expression networks are widely studied in the biomedical field, with algorithms such as WGCNA and lmQCM having been developed to detect co-expressed modules. However, these algorithms have limitations such as insufficient granularity and unbalanced module size, which prevent full acquisition of knowledge from data mining. In addition, it is difficult to incorporate prior knowledge in current co-expression module detection algorithms.
In this paper, we propose a novel module detection algorithm based on topology potential and spectral clustering algorithm to detect co-expressed modules in gene co-expression networks. By testing on TCGA data, our novel method can provide more complete coverage of genes, more balanced module size and finer granularity than current methods in detecting modules with significant overall survival difference. In addition, the proposed algorithm can identify modules by incorporating prior knowledge.
In summary, we developed a method to obtain as much as possible information from networks with increased input coverage and the ability to detect more size-balanced and granular modules. In addition, our method can integrate data from different sources. Our proposed method performs better than current methods with complete coverage of input genes and finer granularity. Moreover, this method is designed not only for gene co-expression networks but can also be applied to any general fully connected weighted network.
基因共表达网络在生物医学领域得到了广泛研究,已经开发出诸如WGCNA和lmQCM等算法来检测共表达模块。然而,这些算法存在局限性,例如粒度不足和模块大小不均衡,这阻碍了从数据挖掘中充分获取知识。此外,在当前的共表达模块检测算法中难以纳入先验知识。
在本文中,我们提出了一种基于拓扑势和谱聚类算法的新型模块检测算法,用于检测基因共表达网络中的共表达模块。通过对TCGA数据进行测试,在检测具有显著总体生存差异的模块时,我们的新方法比当前方法能够提供更完整的基因覆盖、更均衡的模块大小和更细的粒度。此外,所提出的算法可以通过纳入先验知识来识别模块。
总之,我们开发了一种方法,能够从网络中获取尽可能多的信息,具有更高的输入覆盖率以及检测更多大小均衡和粒度更细模块的能力。此外,我们的方法可以整合来自不同来源的数据。我们提出的方法在输入基因的完整覆盖和更细的粒度方面比当前方法表现更好。而且,该方法不仅设计用于基因共表达网络,还可应用于任何一般的全连接加权网络。