Zhao Haitao, Datta Sujay, Duan Zhong-Hui
Department of Mathematics and Computer Science, The University of North Carolina at Pembroke, Pembroke, NC, USA.
Department of Statistics, The University of Akron, Akron, OH, USA.
Bioinform Biol Insights. 2023 Feb 27;17:11779322231152972. doi: 10.1177/11779322231152972. eCollection 2023.
Global genetic networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single genes or local networks. The Gaussian graphical model (GGM) is widely applied to learn genetic networks because it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on the GGM have been proposed for learning genetic network structures. Because the number of gene variables is typically far more than the number of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. However, graphical lasso, although showing good performance in low dimensional data sets, is computationally expensive and inefficient or even unable to work directly on genome-wide gene expression data sets. In this study, the method of Monte Carlo Gaussian graphical model (MCGGM) was proposed to learn global genetic networks of genes. This method uses a Monte Carlo approach to sample subnetworks from genome-wide gene expression data and graphical lasso to learn the structures of the subnetworks. The learned subnetworks are then integrated to approximate a global genetic network. The proposed method was evaluated with a relatively small real data set of RNA-seq expression levels. The results indicate the proposed method shows a strong ability of decoding the interactions with high conditional dependences among genes. The method was then applied to genome-wide data sets of RNA-seq expression levels. The gene interactions with high interdependence from the estimated global networks show that most of the predicted gene-gene interactions have been reported in the literatures playing important roles in different human cancers. Also, the results validate the ability and reliability of the proposed method to identify high conditional dependences among genes in large-scale data sets.
全球遗传网络为人类疾病分析提供了额外信息,超越了传统的聚焦于单个基因或局部网络的分析方法。高斯图形模型(GGM)被广泛应用于学习遗传网络,因为它定义了一个无向图来解码基因之间的条件依赖性。许多基于GGM的算法已被提出用于学习遗传网络结构。由于基因变量的数量通常远远超过所收集样本的数量,并且实际的遗传网络通常是稀疏的,GGM的图形套索实现成为推断基因间条件相互依赖性的流行工具。然而,图形套索虽然在低维数据集中表现良好,但计算成本高且效率低下,甚至无法直接处理全基因组范围的基因表达数据集。在本研究中,提出了蒙特卡罗高斯图形模型(MCGGM)方法来学习基因的全球遗传网络。该方法使用蒙特卡罗方法从全基因组范围的基因表达数据中采样子网,并使用图形套索来学习子网的结构。然后将学习到的子网整合起来以近似一个全球遗传网络。所提出的方法用一个相对较小的RNA-seq表达水平真实数据集进行了评估。结果表明所提出的方法具有很强的解码基因间高条件依赖性相互作用的能力。然后该方法被应用于RNA-seq表达水平的全基因组数据集。从估计的全球网络中具有高相互依赖性的基因相互作用表明,大多数预测的基因-基因相互作用已在文献中报道,它们在不同人类癌症中发挥重要作用。此外,结果验证了所提出的方法在大规模数据集中识别基因间高条件依赖性的能力和可靠性。