Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02142, USA; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02142, USA.
Am J Hum Genet. 2019 May 2;104(5):896-913. doi: 10.1016/j.ajhg.2019.03.020.
Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.
最近的研究强调了基因网络在疾病生物学中的作用。为了正式评估这一点,我们构建了广泛的途径、网络和途径+网络注释,并应用分层 LD 得分回归分析了 42 种疾病和复杂特征(平均 N = 323K),以识别富集的注释。首先,我们分析了 18119 个生物学途径。我们确定了 156 个途径-疾病对,这些疾病的富集在考虑所有基因和 75 个已知功能注释(来自基线 LD 模型)后具有统计学意义(FDR<5%),这是一个严格的步骤,大大减少了检测到的途径数量;大多数显著的途径-疾病对以前没有报道过。接下来,对于四个已发表的基因网络中的每一个,我们根据网络连接构建了概率注释。对于每个基因网络,网络连接注释都显著富集。令人惊讶的是,这种富集完全可以通过网络注释与基线 LD 模型中的调节注释之间的过度重叠来解释,这验证了基线 LD 模型的信息量,并强调了在基因网络分析中考虑调节注释的重要性。最后,对于 156 个富集的途径-疾病对中的每一个,对于四个基因网络中的每一个,我们通过注释与输入途径具有高网络连接的基因来构建途径+网络注释。对于每个基因网络,这些途径+网络注释对于相应的特征都显著富集。同样,这种富集很大程度上可以用基线 LD 模型来解释。总之,基因网络连接对于疾病结构具有高度的信息量,但基因网络中的信息可能被调节注释所包含,这强调了考虑已知注释的重要性。