Department of Computer Science, Indiana University Bloomington, Bloomington, 47408, IN, USA.
Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, 46202, IN, USA.
BMC Bioinformatics. 2019 May 1;20(Suppl 7):196. doi: 10.1186/s12859-019-2733-5.
Gene Co-expression Network Analysis (GCNA) helps identify gene modules with potential biological functions and has become a popular method in bioinformatics and biomedical research. However, most current GCNA algorithms use correlation to build gene co-expression networks and identify modules with highly correlated genes. There is a need to look beyond correlation and identify gene modules using other similarity measures for finding novel biologically meaningful modules.
We propose a new generalized gene co-expression analysis algorithm via subspace clustering that can identify biologically meaningful gene co-expression modules with genes that are not all highly correlated. We use low-rank representation to construct gene co-expression networks and local maximal quasi-clique merger to identify gene co-expression modules. We applied our method on three large microarray datasets and a single-cell RNA sequencing dataset. We demonstrate that our method can identify gene modules with different biological functions than current GCNA methods and find gene modules with prognostic values.
The presented method takes advantage of subspace clustering to generate gene co-expression networks rather than using correlation as the similarity measure between genes. Our generalized GCNA method can provide new insights from gene expression datasets and serve as a complement to current GCNA algorithms.
基因共表达网络分析(GCNA)有助于识别具有潜在生物学功能的基因模块,已成为生物信息学和生物医学研究中的一种流行方法。然而,目前大多数 GCNA 算法使用相关性来构建基因共表达网络,并识别具有高度相关基因的模块。需要超越相关性,使用其他相似性度量来识别基因模块,以发现新的具有生物学意义的模块。
我们提出了一种新的基于子空间聚类的广义基因共表达分析算法,该算法可以识别具有生物学意义的基因共表达模块,其中基因并非全部高度相关。我们使用低秩表示来构建基因共表达网络,并使用局部最大拟簇合并来识别基因共表达模块。我们将我们的方法应用于三个大型微阵列数据集和一个单细胞 RNA 测序数据集。我们证明,我们的方法可以识别具有与当前 GCNA 方法不同生物学功能的基因模块,并找到具有预后价值的基因模块。
所提出的方法利用子空间聚类生成基因共表达网络,而不是使用相关性作为基因之间的相似性度量。我们的广义 GCNA 方法可以从基因表达数据集中提供新的见解,并作为当前 GCNA 算法的补充。