Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America.
Department of Statistics, Pennsylvania State University, University Park, Pennsylvania, United States of America.
PLoS Comput Biol. 2023 Jan 6;19(1):e1010758. doi: 10.1371/journal.pcbi.1010758. eCollection 2023 Jan.
Inferring gene co-expression networks is a useful process for understanding gene regulation and pathway activity. The networks are usually undirected graphs where genes are represented as nodes and an edge represents a significant co-expression relationship. When expression data of multiple (p) genes in multiple (K) conditions (e.g., treatments, tissues, strains) are available, joint estimation of networks harnessing shared information across them can significantly increase the power of analysis. In addition, examining condition-specific patterns of co-expression can provide insights into the underlying cellular processes activated in a particular condition. Condition adaptive fused graphical lasso (CFGL) is an existing method that incorporates condition specificity in a fused graphical lasso (FGL) model for estimating multiple co-expression networks. However, with computational complexity of O(p2K log K), the current implementation of CFGL is prohibitively slow even for a moderate number of genes and can only be used for a maximum of three conditions. In this paper, we propose a faster alternative of CFGL named rapid condition adaptive fused graphical lasso (RCFGL). In RCFGL, we incorporate the condition specificity into another popular model for joint network estimation, known as fused multiple graphical lasso (FMGL). We use a more efficient algorithm in the iterative steps compared to CFGL, enabling faster computation with complexity of O(p2K) and making it easily generalizable for more than three conditions. We also present a novel screening rule to determine if the full network estimation problem can be broken down into estimation of smaller disjoint sub-networks, thereby reducing the complexity further. We demonstrate the computational advantage and superior performance of our method compared to two non-condition adaptive methods, FGL and FMGL, and one condition adaptive method, CFGL in both simulation study and real data analysis. We used RCFGL to jointly estimate the gene co-expression networks in different brain regions (conditions) using a cohort of heterogeneous stock rats. We also provide an accommodating C and Python based package that implements RCFGL.
推断基因共表达网络是理解基因调控和途径活性的有用过程。网络通常是无向图,其中基因表示为节点,边表示显著的共表达关系。当多个(p)基因在多个(K)条件(例如处理、组织、菌株)下的表达数据可用时,利用它们之间的共享信息联合估计网络可以显著提高分析的能力。此外,研究共表达的特定条件模式可以深入了解在特定条件下激活的潜在细胞过程。条件自适应融合图形套索(CFGL)是一种现有的方法,它在融合图形套索(FGL)模型中纳入条件特异性,用于估计多个共表达网络。然而,由于计算复杂度为 O(p2KlogK),即使对于中等数量的基因,CFGL 的当前实现也非常缓慢,并且只能用于最多三个条件。在本文中,我们提出了 CFGL 的一种更快的替代方法,称为快速条件自适应融合图形套索(RCFGL)。在 RCFGL 中,我们将条件特异性纳入另一种用于联合网络估计的流行模型,称为融合多图形套索(FMGL)。与 CFGL 相比,我们在迭代步骤中使用了更有效的算法,从而实现了更快的计算,复杂度为 O(p2K),并且很容易推广到超过三个条件。我们还提出了一种新的筛选规则,以确定是否可以将完整的网络估计问题分解为较小的不相交子网络的估计,从而进一步降低复杂性。我们在模拟研究和真实数据分析中证明了我们的方法与两种非条件自适应方法(FGL 和 FMGL)和一种条件自适应方法(CFGL)相比具有计算优势和优越的性能。我们使用 RCFGL 联合估计了不同脑区(条件)的基因共表达网络,使用了一个异质 stock 大鼠队列。我们还提供了一个基于 C 和 Python 的包,实现了 RCFGL。