State Key Laboratory of Nuclear Resources and Environment and School of Water Resources and Environmental Engineering, East China University of Technology, Nanchang, 330013, China.
State Key Laboratory of Nuclear Resources and Environment and School of Chemistry, Biology and Materials Science, East China University of Technology, Nanchang, 330013, China.
BMC Genom Data. 2021 Dec 10;22(Suppl 1):54. doi: 10.1186/s12863-021-01004-y.
Since genes involved in the same biological modules usually present correlated expression profiles, lots of computational methods have been proposed to identify gene functional modules based on the expression profiles data. Recently, Sparse Singular Value Decomposition (SSVD) method has been proposed to bicluster gene expression data to identify gene modules. However, this model can only handle the gene expression data where no gene interaction information is integrated. Ignoring the prior gene interaction information may produce the identified gene modules hard to be biologically interpreted.
In this paper, we develop a Sparse Network-regularized SVD (SNSVD) method that integrates a prior gene interaction network from a protein protein interaction network and gene expression data to identify underlying gene functional modules. The results on a set of simulated data show that SNSVD is more effective than the traditional SVD-based methods. The further experiment results on real cancer genomic data show that most co-expressed modules are not only significantly enriched on GO/KEGG pathways, but also correspond to dense sub-networks in the prior gene interaction network. Besides, we also use our method to identify ten differentially co-expressed miRNA-gene modules by integrating matched miRNA and mRNA expression data of breast cancer from The Cancer Genome Atlas (TCGA). Several important breast cancer related miRNA-gene modules are discovered.
All the results demonstrate that SNSVD can overcome the drawbacks of SSVD and capture more biologically relevant functional modules by incorporating a prior gene interaction network. These identified functional modules may provide a new perspective to understand the diagnostics, occurrence and progression of cancer.
由于涉及同一生物模块的基因通常呈现出相关的表达谱,因此已经提出了许多计算方法,基于表达谱数据来识别基因功能模块。最近,稀疏奇异值分解(SSVD)方法被提出用于双聚类基因表达数据以识别基因模块。然而,这种模型只能处理未集成基因交互信息的基因表达数据。忽略先验的基因交互信息可能会导致识别出的基因模块难以进行生物学解释。
在本文中,我们开发了一种稀疏网络正则化 SVD(SNSVD)方法,该方法集成了来自蛋白质相互作用网络和基因表达数据的先验基因交互网络,以识别潜在的基因功能模块。一组模拟数据的结果表明,SNSVD 比传统基于 SVD 的方法更有效。在真实癌症基因组数据上的进一步实验结果表明,大多数共表达模块不仅在 GO/KEGG 途径上显著富集,而且与先验基因交互网络中的密集子网络相对应。此外,我们还通过整合来自癌症基因组图谱(TCGA)的匹配 miRNA 和 mRNA 表达数据,使用我们的方法来识别十个差异共表达 miRNA-基因模块。发现了几个重要的与乳腺癌相关的 miRNA-基因模块。
所有结果表明,SNSVD 可以克服 SSVD 的缺点,并通过整合先验基因交互网络来捕获更多与生物学相关的功能模块。这些鉴定的功能模块可能为理解癌症的诊断、发生和进展提供新的视角。