Melo Diogo, Pallares Luisa F, Ayroles Julien F
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America.
Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey, United States of America.
PLoS Comput Biol. 2024 Jul 29;20(7):e1012300. doi: 10.1371/journal.pcbi.1012300. eCollection 2024 Jul.
Finding communities in gene co-expression networks is a common first step toward extracting biological insight from these complex datasets. Most community detection algorithms expect genes to be organized into assortative modules, that is, groups of genes that are more associated with each other than with genes in other groups. While it is reasonable to expect that these modules exist, using methods that assume they exist a priori is risky, as it guarantees that alternative organizations of gene interactions will be ignored. Here, we ask: can we find meaningful communities without imposing a modular organization on gene co-expression networks, and how modular are these communities? For this, we use a recently developed community detection method, the weighted degree corrected stochastic block model (SBM), that does not assume that assortative modules exist. Instead, the SBM attempts to efficiently use all information contained in the co-expression network to separate the genes into hierarchically organized blocks of genes. Using RNAseq gene expression data measured in two tissues derived from an outbred population of Drosophila melanogaster, we show that (a) the SBM is able to find ten times as many groups as competing methods, that (b) several of those gene groups are not modular, and that (c) the functional enrichment for non-modular groups is as strong as for modular communities. These results show that the transcriptome is structured in more complex ways than traditionally thought and that we should revisit the long-standing assumption that modularity is the main driver of the structuring of gene co-expression networks.
在基因共表达网络中寻找群落是从这些复杂数据集中提取生物学见解的常见第一步。大多数群落检测算法期望基因被组织成 assortative 模块,即基因组内的基因之间比与其他组中的基因更相关。虽然期望这些模块存在是合理的,但使用先验假设它们存在的方法是有风险的,因为这保证了基因相互作用的其他组织方式将被忽略。在这里,我们提出问题:我们能否在不对基因共表达网络强加模块化组织的情况下找到有意义的群落,以及这些群落的模块化程度如何?为此,我们使用一种最近开发的群落检测方法,加权度校正随机块模型(SBM),该方法不假设 assortative 模块存在。相反,SBM 试图有效利用共表达网络中包含的所有信息,将基因分离成层次组织的基因块。使用在源自黑腹果蝇远交群体的两个组织中测量的 RNAseq 基因表达数据,我们表明:(a)SBM 能够找到比竞争方法多十倍的组;(b)其中几个基因组不是模块化的;(c)非模块化组的功能富集与模块化群落一样强。这些结果表明,转录组的结构比传统认为的更复杂,我们应该重新审视模块化是基因共表达网络结构的主要驱动因素这一长期假设。