Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
Am J Hum Genet. 2023 Nov 2;110(11):1863-1874. doi: 10.1016/j.ajhg.2023.09.015. Epub 2023 Oct 24.
Genome-wide association studies (GWASs) across thousands of traits have revealed the pervasive pleiotropy of trait-associated genetic variants. While methods have been proposed to characterize pleiotropic components across groups of phenotypes, scaling these approaches to ultra-large-scale biobanks has been challenging. Here, we propose FactorGo, a scalable variational factor analysis model to identify and characterize pleiotropic components using biobank GWAS summary data. In extensive simulations, we observe that FactorGo outperforms the state-of-the-art (model-free) approach tSVD in capturing latent pleiotropic factors across phenotypes while maintaining a similar computational cost. We apply FactorGo to estimate 100 latent pleiotropic factors from GWAS summary data of 2,483 phenotypes measured in European-ancestry Pan-UK BioBank individuals (N = 420,531). Next, we find that factors from FactorGo are more enriched with relevant tissue-specific annotations than those identified by tSVD (p = 2.58E-10) and validate our approach by recapitulating brain-specific enrichment for BMI and the height-related connection between reproductive system and muscular-skeletal growth. Finally, our analyses suggest shared etiologies between rheumatoid arthritis and periodontal condition in addition to alkaline phosphatase as a candidate prognostic biomarker for prostate cancer. Overall, FactorGo improves our biological understanding of shared etiologies across thousands of GWASs.
全基因组关联研究(GWAS)在数千个特征上揭示了与特征相关的遗传变异的普遍多效性。虽然已经提出了一些方法来描述表型组的多效性成分,但将这些方法扩展到超大规模生物库一直具有挑战性。在这里,我们提出了 FactorGo,这是一种可扩展的变分因子分析模型,用于使用生物库 GWAS 汇总数据识别和描述多效性成分。在广泛的模拟中,我们观察到 FactorGo 在捕获表型之间的潜在多效性因子方面优于最先进的(无模型)方法 tSVD,同时保持相似的计算成本。我们应用 FactorGo 从 2483 种表型的 GWAS 汇总数据中估计 100 个潜在的多效性因子,这些表型是在欧洲血统的 Pan-UK BioBank 个体中测量的(N = 420,531)。接下来,我们发现 FactorGo 的因子比 tSVD 识别的因子更丰富与相关组织特异性注释(p = 2.58E-10),并通过重现已知的 BMI 与生殖系统和肌肉骨骼生长之间的相关性在大脑中的特异性富集以及碱性磷酸酶作为前列腺癌候选预后生物标志物来验证我们的方法。最后,我们的分析表明类风湿关节炎和牙周状况之间存在共同的病因,以及碱性磷酸酶作为前列腺癌候选预后生物标志物。总体而言,FactorGo 提高了我们对数千个 GWAS 中共同病因的生物学理解。