School of Bioscience, Systems Biology Research Center, University of Skövde, Skövde, Sweden.
Bioinformatics, Department of Physics, Chemistry and Biology, Linköping university, Linköping, Sweden.
BMC Genomics. 2021 Aug 30;22(1):631. doi: 10.1186/s12864-021-07935-1.
There exist few, if any, practical guidelines for predictive and falsifiable multi-omic data integration that systematically integrate existing knowledge. Disease modules are popular concepts for interpreting genome-wide studies in medicine but have so far not been systematically evaluated and may lead to corroborating multi-omic modules.
We assessed eight module identification methods in 57 previously published expression and methylation studies of 19 diseases using GWAS enrichment analysis. Next, we applied the same strategy for multi-omic integration of 20 datasets of multiple sclerosis (MS), and further validated the resulting module using both GWAS and risk-factor-associated genes from several independent cohorts. Our benchmark of modules showed that in immune-associated diseases modules inferred from clique-based methods were the most enriched for GWAS genes. The multi-omic case study using MS data revealed the robust identification of a module of 220 genes. Strikingly, most genes of the module were differentially methylated upon the action of one or several environmental risk factors in MS (n = 217, P = 10) and were also independently validated for association with five different risk factors of MS, which further stressed the high genetic and epigenetic relevance of the module for MS.
We believe our analysis provides a workflow for selecting modules and our benchmark study may help further improvement of disease module methods. Moreover, we also stress that our methodology is generally applicable for combining and assessing the performance of multi-omic approaches for complex diseases.
目前几乎没有实用的预测性和可证伪的多组学数据整合指南,这些指南可以系统地整合现有知识。疾病模块是解释医学全基因组研究的流行概念,但迄今为止尚未进行系统评估,并且可能导致证实多组学模块。
我们使用 GWAS 富集分析评估了 8 种模块识别方法在 19 种疾病的 57 个先前发表的表达和甲基化研究中的应用。接下来,我们将相同的策略应用于 20 个多发性硬化症(MS)多组学数据集的整合,并使用来自多个独立队列的 GWAS 和与风险因素相关的基因进一步验证了所得模块。我们的模块基准表明,在免疫相关疾病中,基于团块的方法推断的模块与 GWAS 基因的富集程度最高。使用 MS 数据的多组学案例研究揭示了一个包含 220 个基因的模块的稳健识别。引人注目的是,该模块的大多数基因在 MS 中一种或多种环境风险因素的作用下发生了差异甲基化(n=217,P=10),并且还独立验证了与 MS 的五个不同风险因素的关联,这进一步强调了该模块对 MS 的高遗传和表观遗传相关性。
我们相信我们的分析提供了一种选择模块的工作流程,我们的基准研究可能有助于进一步改进疾病模块方法。此外,我们还强调,我们的方法学通常适用于组合和评估复杂疾病的多组学方法的性能。