Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Espoo, Finland.
Eur J Hum Genet. 2021 Feb;29(2):309-324. doi: 10.1038/s41431-020-00730-8. Epub 2020 Oct 27.
Multivariate methods are known to increase the statistical power to detect associations in the case of shared genetic basis between phenotypes. They have, however, lacked essential analytic tools to follow-up and understand the biology underlying these associations. We developed a novel computational workflow for multivariate GWAS follow-up analyses, including fine-mapping and identification of the subset of traits driving associations (driver traits). Many follow-up tools require univariate regression coefficients which are lacking from multivariate results. Our method overcomes this problem by using Canonical Correlation Analysis to turn each multivariate association into its optimal univariate Linear Combination Phenotype (LCP). This enables an LCP-GWAS, which in turn generates the statistics required for follow-up analyses. We implemented our method on 12 highly correlated inflammatory biomarkers in a Finnish population-based study. Altogether, we identified 11 associations, four of which (F5, ABO, C1orf140 and PDGFRB) were not detected by biomarker-specific analyses. Fine-mapping identified 19 signals within the 11 loci and driver trait analysis determined the traits contributing to the associations. A phenome-wide association study on the 19 representative variants from the signals in 176,899 individuals from the FinnGen study revealed 53 disease associations (p < 1 × 10). Several reported pQTLs in the 11 loci provided orthogonal evidence for the biologically relevant functions of the representative variants. Our novel multivariate analysis workflow provides a powerful addition to standard univariate GWAS analyses by enabling multivariate GWAS follow-up and thus promoting the advancement of powerful multivariate methods in genomics.
多元方法在存在表型间共享遗传基础的情况下,被认为可以提高检测关联的统计效力。然而,它们缺乏必要的分析工具来跟踪和理解这些关联背后的生物学。我们开发了一种新的多元 GWAS 后续分析计算工作流程,包括精细映射和确定驱动关联的特征子集(驱动特征)。许多后续工具需要单变量回归系数,但多元结果中缺乏这些系数。我们的方法通过使用典型相关分析将每个多元关联转换为其最佳的单变量线性组合表型 (LCP) 来克服这个问题。这使得可以进行 LCP-GWAS,从而生成后续分析所需的统计数据。我们在一个芬兰人群研究中对 12 个高度相关的炎症生物标志物实施了我们的方法。总共,我们确定了 11 个关联,其中 4 个(F5、ABO、C1orf140 和 PDGFRB)没有通过生物标志物特异性分析检测到。精细映射在 11 个基因座中的 19 个信号中确定了 19 个信号,驱动特征分析确定了导致关联的特征。在 176,899 名来自 FinnGen 研究的个体中的 19 个代表性变体的全表型关联研究中,揭示了 53 种疾病关联(p<1×10)。在 11 个基因座中的 19 个信号中的 19 个代表性变体的全基因组关联研究中,揭示了 53 种疾病关联(p<1×10)。在 11 个基因座中的 19 个信号中的 19 个代表性变体的全基因组关联研究中,揭示了 53 种疾病关联(p<1×10)。在 11 个基因座中的 19 个信号中的 19 个代表性变体的全基因组关联研究中,揭示了 53 种疾病关联(p<1×10)。在 11 个基因座中的 19 个信号中的 19 个代表性变体的全基因组关联研究中,揭示了 53 种疾病关联(p<1×10)。几个报道的 pQTL 在 11 个基因座中提供了代表性变体具有生物学相关性的正交证据。我们的新多元分析工作流程通过实现多元 GWAS 后续分析,为标准单变量 GWAS 分析提供了强大的补充,从而促进了基因组学中强大的多元方法的发展。