Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America.
Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America.
PLoS Genet. 2022 Jun 16;18(6):e1010251. doi: 10.1371/journal.pgen.1010251. eCollection 2022 Jun.
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.
十多年的全基因组关联研究 (GWAS) 已经确定了与复杂性状显著相关的遗传风险变异。新出现的证据表明,与性状相关的变异的功能可能以组织或细胞类型特异性的方式发挥作用。然而,确定与疾病相关的组织或细胞类型以阐明疾病病因仍然具有挑战性。在这里,我们提出了 EPIC(细胞类型富集),这是一种统计框架,它将大规模 GWAS 汇总统计数据与单细胞 RNA 测序 (scRNA-seq) 的细胞类型特异性基因表达测量相关联。我们分别和联合为常见和罕见变异推导了强大的基因水平检验统计量,并采用广义最小二乘法在考虑基因内和基因间相关结构的同时优先考虑与性状相关的细胞类型。我们使用肝脏中与四种脂质性状相关的基因座的富集和大脑中与三种神经障碍相关的基因座的富集作为ground truth,表明 EPIC 优于现有方法。我们将我们的框架应用于来自不同平台的多个 scRNA-seq 数据集,并确定了 2 型糖尿病和精神分裂症的潜在细胞类型。使用独立的 GWAS 和 scRNA-seq 数据集进行了富集复制,并使用 PubMed 搜索和现有的批量病例对照测试结果进行了进一步验证。