Deng Qiaolan, Gupta Arkobrato, Jeon Hyeongseon, Nam Jin Hyun, Yilmaz Ayse Selen, Chang Won, Pietrzak Maciej, Li Lang, Kim Hang J, Chung Dongjun
The Interdisciplinary PhD Program in Biostatistics, The Ohio State University, Columbus, OH, United States.
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States.
Front Genet. 2023 Jul 12;14:1079198. doi: 10.3389/fgene.2023.1079198. eCollection 2023.
Genome-wide association studies (GWAS) have successfully identified a large number of genetic variants associated with traits and diseases. However, it still remains challenging to fully understand the functional mechanisms underlying many associated variants. This is especially the case when we are interested in variants shared across multiple phenotypes. To address this challenge, we propose graph-GPA 2.0 (GGPA 2.0), a statistical framework to integrate GWAS datasets for multiple phenotypes and incorporate functional annotations within a unified framework. Our simulation studies showed that incorporating functional annotation data using GGPA 2.0 not only improves the detection of disease-associated variants, but also provides a more accurate estimation of relationships among diseases. Next, we analyzed five autoimmune diseases and five psychiatric disorders with the functional annotations derived from GenoSkyline and GenoSkyline-Plus, along with the prior disease graph generated by biomedical literature mining. For autoimmune diseases, GGPA 2.0 identified enrichment for blood-related epigenetic marks, especially B cells and regulatory T cells, across multiple diseases. Psychiatric disorders were enriched for brain-related epigenetic marks, especially the prefrontal cortex and the inferior temporal lobe for bipolar disorder and schizophrenia, respectively. In addition, the pleiotropy between bipolar disorder and schizophrenia was also detected. Finally, we found that GGPA 2.0 is robust to the use of irrelevant and/or incorrect functional annotations. These results demonstrate that GGPA 2.0 can be a powerful tool to identify genetic variants associated with each phenotype or those shared across multiple phenotypes, while also promoting an understanding of functional mechanisms underlying the associated variants.
全基因组关联研究(GWAS)已成功识别出大量与性状和疾病相关的基因变异。然而,要全面理解许多相关变异背后的功能机制仍然具有挑战性。当我们对跨多种表型共享的变异感兴趣时,情况尤其如此。为应对这一挑战,我们提出了图-GPA 2.0(GGPA 2.0),这是一个统计框架,用于整合多种表型的GWAS数据集,并在统一框架内纳入功能注释。我们的模拟研究表明,使用GGPA 2.0纳入功能注释数据不仅能提高对疾病相关变异的检测能力,还能更准确地估计疾病之间的关系。接下来,我们利用从GenoSkyline和GenoSkyline-Plus获得的功能注释,以及通过生物医学文献挖掘生成的先验疾病图,分析了五种自身免疫性疾病和五种精神疾病。对于自身免疫性疾病,GGPA 2.0在多种疾病中识别出血液相关表观遗传标记的富集,尤其是B细胞和调节性T细胞。精神疾病则富集了大脑相关表观遗传标记,双相情感障碍和精神分裂症分别富集于前额叶皮质和颞下回。此外,还检测到双相情感障碍和精神分裂症之间的多效性。最后,我们发现GGPA 2.0对于使用不相关和/或错误的功能注释具有鲁棒性。这些结果表明,GGPA 2.0可以成为一种强大的工具,用于识别与每种表型相关的基因变异或跨多种表型共享的变异,同时也有助于理解相关变异背后的功能机制。