Dalapati Trisha, Wang Liuyang, Jones Angela G, Cardwell Jonathan, Konigsberg Iain R, Bossé Yohan, Sin Don D, Timens Wim, Hao Ke, Yang Ivana, Ko Dennis C
Medical Scientist Training Program, Duke University School of Medicine, Durham, NC, USA.
Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC, USA.
medRxiv. 2024 Jul 14:2024.07.13.24310305. doi: 10.1101/2024.07.13.24310305.
Most genetic variants identified through genome-wide association studies (GWAS) are suspected to be regulatory in nature, but only a small fraction colocalize with expression quantitative trait loci (eQTLs, variants associated with expression of a gene). Therefore, it is hypothesized but largely untested that integration of disease GWAS with context-specific eQTLs will reveal the underlying genes driving disease associations. We used colocalization and transcriptomic analyses to identify shared genetic variants and likely causal genes associated with critically ill COVID-19 and idiopathic pulmonary fibrosis. We first identified five genome-wide significant variants associated with both diseases. Four of the variants did not demonstrate clear colocalization between GWAS and healthy lung eQTL signals. Instead, two of the four variants colocalized only in cell-type and disease-specific eQTL datasets. These analyses pointed to higher expression from the C allele of rs12585036, in monocytes and in lung tissue from primarily smokers, which increased risk of IPF and decreased risk of critically ill COVID-19. We also found lower expression (and higher methylation at a specific CpG) from the G allele of rs12610495, acting in fibroblasts and in IPF lungs, and increased risk of IPF and critically ill COVID-19. We further found differential expression of the identified causal genes in diseased lungs when compared to non-diseased lungs, specifically in epithelial and immune cell types. These findings highlight the power of integrating GWAS, context-specific eQTLs, and transcriptomics of diseased tissue to harness human genetic variation to identify causal genes and where they function during multiple diseases.
通过全基因组关联研究(GWAS)鉴定出的大多数基因变异在本质上被怀疑具有调控作用,但只有一小部分与表达数量性状位点(eQTL,即与基因表达相关的变异)共定位。因此,有假设认为,将疾病GWAS与特定背景下的eQTL整合起来将揭示驱动疾病关联的潜在基因,但这在很大程度上尚未得到验证。我们使用共定位和转录组分析来识别与重症COVID-19和特发性肺纤维化相关的共享基因变异和可能的因果基因。我们首先鉴定出与这两种疾病都相关的五个全基因组显著变异。其中四个变异在GWAS和健康肺组织eQTL信号之间未显示出明显的共定位。相反,这四个变异中的两个仅在细胞类型和疾病特异性eQTL数据集中共定位。这些分析表明,rs12585036的C等位基因在单核细胞和主要来自吸烟者的肺组织中表达较高,这增加了患特发性肺纤维化的风险并降低了患重症COVID-19的风险。我们还发现,rs12610495的G等位基因在成纤维细胞和特发性肺纤维化肺组织中发挥作用,其表达较低(且在特定CpG处甲基化程度较高),并增加了患特发性肺纤维化和重症COVID-19的风险。与非患病肺组织相比,我们进一步发现所鉴定的因果基因在患病肺组织中存在差异表达,特别是在上皮细胞和免疫细胞类型中。这些发现突出了整合GWAS、特定背景下的eQTL以及患病组织转录组学以利用人类基因变异来识别因果基因及其在多种疾病中发挥作用的位置的能力。