Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America.
PLoS Comput Biol. 2012;8(7):e1002587. doi: 10.1371/journal.pcbi.1002587. Epub 2012 Jul 5.
With the recent success of genome-wide association studies (GWAS), a wealth of association data has been accomplished for more than 200 complex diseases/traits, proposing a strong demand for data integration and interpretation. A combinatory analysis of multiple GWAS datasets, or an integrative analysis of GWAS data and other high-throughput data, has been particularly promising. In this study, we proposed an integrative analysis framework of multiple GWAS datasets by overlaying association signals onto the protein-protein interaction network, and demonstrated it using schizophrenia datasets. Building on a dense module search algorithm, we first searched for significantly enriched subnetworks for schizophrenia in each single GWAS dataset and then implemented a discovery-evaluation strategy to identify module genes with consistent association signals. We validated the module genes in an independent dataset, and also examined them through meta-analysis of the related SNPs using multiple GWAS datasets. As a result, we identified 205 module genes with a joint effect significantly associated with schizophrenia; these module genes included a number of well-studied candidate genes such as DISC1, GNA12, GNA13, GNAI1, GPR17, and GRIN2B. Further functional analysis suggested these genes are involved in neuronal related processes. Additionally, meta-analysis found that 18 SNPs in 9 module genes had P(meta)<1 × 10⁻⁴, including the gene HLA-DQA1 located in the MHC region on chromosome 6, which was reported in previous studies using the largest cohort of schizophrenia patients to date. These results demonstrated our bi-directional network-based strategy is efficient for identifying disease-associated genes with modest signals in GWAS datasets. This approach can be applied to any other complex diseases/traits where multiple GWAS datasets are available.
随着全基因组关联研究 (GWAS) 的最近成功,已经完成了超过 200 种复杂疾病/特征的大量关联数据,这对数据集成和解释提出了强烈的需求。对多个 GWAS 数据集进行组合分析,或对 GWAS 数据和其他高通量数据进行综合分析,一直是特别有希望的。在这项研究中,我们通过将关联信号叠加到蛋白质-蛋白质相互作用网络上来提出一种多 GWAS 数据集的综合分析框架,并使用精神分裂症数据集进行了演示。基于密集模块搜索算法,我们首先在每个单一的 GWAS 数据集中搜索与精神分裂症显著相关的子网络,然后实施发现-评估策略,以识别具有一致关联信号的模块基因。我们在一个独立的数据集上验证了模块基因,还通过使用多个 GWAS 数据集对相关 SNP 进行元分析来检查它们。结果,我们确定了 205 个具有与精神分裂症显著相关的联合效应的模块基因;这些模块基因包括许多经过充分研究的候选基因,如 DISC1、GNA12、GNA13、GNAI1、GPR17 和 GRIN2B。进一步的功能分析表明,这些基因涉及神经元相关过程。此外,元分析发现,9 个模块基因中的 18 个 SNP 在 18 个模块基因中有 P(meta)<1 × 10⁻⁴,包括位于染色体 6 上 MHC 区域的 HLA-DQA1 基因,这在以前使用迄今为止最大的精神分裂症患者队列的研究中已有报道。这些结果表明,我们的双向网络基策略在识别 GWAS 数据集中具有适度信号的疾病相关基因方面是有效的。这种方法可以应用于任何其他具有多个 GWAS 数据集的复杂疾病/特征。