Institute of Translational Genomics, Helmholtz Zentrum München - German Research Center for Environmental Health, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany.
MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, UK.
Mol Metab. 2022 Jul;61:101509. doi: 10.1016/j.molmet.2022.101509. Epub 2022 Apr 30.
Deep sequencing offers unparalleled access to rare variants in human populations. Understanding their role in disease is a priority, yet prohibitive sequencing costs mean that many cohorts lack the sample size to discover these effects on their own. Meta-analysis of individual variant scores allows the combination of rare variants across cohorts and study of their aggregated effect at the gene level, boosting discovery power. However, the methods involved have largely not been field-tested. In this study, we aim to perform the first meta-analysis of gene-based rare variant aggregation optimal tests, applied to the human cardiometabolic proteome.
Here, we carry out this analysis across MANOLIS, Pomak and ORCADES, three isolated European cohorts with whole-genome sequencing (total N = 4,422). We examine the genetic architecture of 250 proteomic traits of cardiometabolic relevance. We use a containerised pipeline to harmonise variant lists across cohorts and define four sets of qualifying variants. For every gene, we interrogate protein-damaging variants, exonic variants, exonic and regulatory variants, and regulatory only variants, using the CADD and Eigen scores to weigh variants according to their predicted functional consequence. We perform single-cohort rare variant analysis and meta-analyse variant scores using the SMMAT package.
We describe 5 rare variant pQTLs (RV-pQTL) which pass our stringent significance threshold (7.45 × 10) and quality control procedure. These were split between four cis signals for MARCO, TEK, MMP2 and MPO, and one trans association for GDF2 in the SERPINA11 gene. We show that the cis-MPO association, which was not detectable using the single-point data alone, is driven by 5 missense and frameshift variants. These include rs140636390 and rs119468010, which are specific to MANOLIS and ORCADES, respectively. We show how this kind of signal could improve the predictive accuracy of genetic factors in common complex disease such as stroke and cardiovascular disease.
Our proof-of-concept study demonstrates the power of gene-based meta-analyses for discovering disease-relevant associations complementing common-variant signals by incorporating population-specific rare variation.
深度测序为人类群体中的罕见变异提供了无与伦比的访问途径。了解它们在疾病中的作用是当务之急,然而,昂贵的测序成本意味着许多队列缺乏发现这些影响的样本量。个体变异评分的荟萃分析允许在队列之间组合罕见变异,并在基因水平上研究它们的综合效应,从而提高发现能力。然而,所涉及的方法在很大程度上还没有经过现场测试。在这项研究中,我们旨在对基于基因的罕见变异聚集最优检验进行首次荟萃分析,应用于人类心脏代谢蛋白质组。
在这里,我们在 MANOLIS、Pomak 和 ORCADES 三个欧洲隔离队列中进行了全基因组测序(总 N=4422 人)的这项分析。我们检查了 250 个与心脏代谢相关的蛋白质组学特征的遗传结构。我们使用集装箱化管道在队列之间协调变异列表,并定义了四组合格变异。对于每个基因,我们使用 CADD 和 Eigen 评分来根据其预测的功能后果加权变体,检查蛋白质损伤变体、外显子变体、外显子和调节变体以及仅调节变体。我们使用 SMMAT 包进行单队列罕见变异分析和变异评分荟萃分析。
我们描述了 5 个通过严格显著性阈值(7.45×10)和质量控制程序的罕见变异 pQTL(RV-pQTL)。这些分为四个 cis 信号,用于 MARCO、TEK、MMP2 和 MPO,以及一个 SERPINA11 基因中的 GDF2 的 trans 关联。我们表明,单独使用单点数据无法检测到 cis-MPO 关联,由 5 个错义突变和移码突变驱动。其中包括 rs140636390 和 rs119468010,它们分别是 MANOLIS 和 ORCADES 的特异性。我们展示了这种信号如何通过纳入特定于人群的罕见变异来提高常见复杂疾病(如中风和心血管疾病)遗传因素的预测准确性。
我们的概念验证研究表明,基于基因的荟萃分析通过整合特定于人群的罕见变异来补充常见变异信号,从而为发现与疾病相关的关联提供了强大的功能。