Arbet Jaron, Zhuang Yaxu, Litkowski Elizabeth, Saba Laura, Kechris Katerina
Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States.
Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States.
Front Genet. 2021 May 19;12:630215. doi: 10.3389/fgene.2021.630215. eCollection 2021.
Genes often work together to perform complex biological processes, and "networks" provide a versatile framework for representing the interactions between multiple genes. Differential network analysis (DiNA) quantifies how this network structure differs between two or more groups/phenotypes (e.g., disease subjects and healthy controls), with the goal of determining whether differences in network structure can help explain differences between phenotypes. In this paper, we focus on gene co-expression networks, although in principle, the methods studied can be used for DiNA for other types of features (e.g., metabolome, epigenome, microbiome, proteome, etc.). Three common applications of DiNA involve (1) testing whether the connections to a single gene differ between groups, (2) testing whether the connection between a of genes differs between groups, or (3) testing whether the connections within a "module" (a subset of 3 or more genes) differs between groups. This article focuses on the latter, as there is a lack of studies comparing statistical methods for identifying differentially co-expressed modules (DCMs). Through extensive simulations, we compare several previously proposed test statistics and a new p-norm difference test (PND). We demonstrate that the true positive rate of the proposed PND test is competitive with and often higher than the other methods, while controlling the false positive rate. The R package discoMod (differentially co-expressed modules) implements the proposed method and provides a full pipeline for identifying DCMs: clustering tools to derive gene modules, tests to identify DCMs, and methods for visualizing the results.
基因通常协同作用以执行复杂的生物学过程,而“网络”为表示多个基因之间的相互作用提供了一个通用框架。差异网络分析(DiNA)量化了这种网络结构在两个或更多组/表型(例如,疾病患者和健康对照)之间的差异,目的是确定网络结构的差异是否有助于解释表型之间的差异。在本文中,我们专注于基因共表达网络,尽管原则上,所研究的方法可用于其他类型特征(例如,代谢组、表观基因组、微生物组、蛋白质组等)的DiNA。DiNA的三个常见应用包括:(1)测试单个基因的连接在不同组之间是否不同;(2)测试一组基因之间的连接在不同组之间是否不同;或(3)测试一个“模块”(三个或更多基因的子集)内的连接在不同组之间是否不同。本文重点关注后者,因为缺乏比较识别差异共表达模块(DCM)的统计方法的研究。通过广泛的模拟,我们比较了几种先前提出的检验统计量和一种新的p范数差异检验(PND)。我们证明,所提出的PND检验的真阳性率与其他方法具有竞争力,并且通常高于其他方法,同时控制了假阳性率。R包discoMod(差异共表达模块)实现了所提出的方法,并提供了一个用于识别DCM的完整流程:用于推导基因模块的聚类工具、用于识别DCM的检验以及用于可视化结果的方法。