Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
Genomics plc, Oxford, OX11JD, UK.
Am J Hum Genet. 2021 Dec 2;108(12):2354-2367. doi: 10.1016/j.ajhg.2021.11.005. Epub 2021 Nov 24.
Whole-genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery not addressed by the traditional one variant, one phenotype association study. Here, we introduce a Bayesian model comparison approach called MRP (multiple rare variants and phenotypes) for rare-variant association studies that considers correlation, scale, and direction of genetic effects across a group of genetic variants, phenotypes, and studies, requiring only summary statistic data. We apply our method to exome sequencing data (n = 184,698) across 2,019 traits from the UK Biobank, aggregating signals in genes. MRP demonstrates an ability to recover signals such as associations between PCSK9 and LDL cholesterol levels. We additionally find MRP effective in conducting meta-analyses in exome data. Non-biomarker findings include associations between MC1R and red hair color and skin color, IL17RA and monocyte count, and IQGAP2 and mean platelet volume. Finally, we apply MRP in a multi-phenotype setting; after clustering the 35 biomarker phenotypes based on genetic correlation estimates, we find that joint analysis of these phenotypes results in substantial power gains for gene-trait associations, such as in TNFRSF13B in one of the clusters containing diabetes- and lipid-related traits. Overall, we show that the MRP model comparison approach improves upon useful features from widely used meta-analysis approaches for rare-variant association analyses and prioritizes protective modifiers of disease risk.
全基因组测序研究应用于具有广泛表型的大人群或生物库带来了新的分析挑战。需要同时考虑一个基因座或一组基因中的许多变体,并且有可能研究具有共同遗传结构的许多相关表型,这为发现传统的一个变体、一个表型关联研究未解决的问题提供了机会。在这里,我们介绍了一种称为 MRP(多个罕见变异和表型)的贝叶斯模型比较方法,用于罕见变异关联研究,该方法考虑了一组遗传变异、表型和研究中遗传效应的相关性、规模和方向,仅需要汇总统计数据。我们将我们的方法应用于 UK Biobank 中 2019 个特征的外显子测序数据(n = 184698),汇总了基因中的信号。MRP 证明了能够恢复信号的能力,例如 PCSK9 与 LDL 胆固醇水平之间的关联。我们还发现 MRP 在进行外显子数据的荟萃分析时非常有效。非生物标志物的发现包括 PCSK9 与 LDL 胆固醇水平之间的关联、MC1R 与红发和肤色之间的关联、IL17RA 与单核细胞计数之间的关联以及 IQGAP2 与平均血小板体积之间的关联。最后,我们在多表型环境中应用了 MRP;根据遗传相关性估计对 35 个生物标志物表型进行聚类后,我们发现对这些表型进行联合分析会导致基因-表型关联的功效大大提高,例如在 TNFRSF13B 中,其中一个包含与糖尿病和脂质相关的特征的聚类。总的来说,我们表明,MRP 模型比较方法改进了广泛用于罕见变异关联分析的有用方法,并优先考虑了疾病风险的保护性修饰因子。