Zhang Zixuan Eleanor, Kim Artem, Suboc Noah, Mancuso Nicholas, Gazal Steven
Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California.
Department of Quantitative and Computational Biology, University of Southern California.
medRxiv. 2025 Mar 5:2025.01.18.25320755. doi: 10.1101/2025.01.18.25320755.
Population-scale single-cell transcriptomic technologies (scRNA-seq) enable characterizing variant effects on gene regulation at the cellular level (e.g., single-cell eQTLs; sc-eQTLs). However, existing sc-eQTL mapping approaches are either not designed for analyzing sparse counts in scRNA-seq data or can become intractable in extremely large datasets. Here, we propose jaxQTL, a flexible and efficient sc-eQTL mapping framework using highly efficient count-based models given pseudobulk data. Using extensive simulations, we demonstrated that jaxQTL with a negative binomial model outperformed other models in identifying sc-eQTLs, while maintaining a calibrated type I error. We applied jaxQTL across 14 cell types of OneK1K scRNA-seq data (=982), and identified 11-16% more eGenes compared with existing approaches, primarily driven by jaxQTL ability to identify lowly expressed eGenes. We observed that fine-mapped sc-eQTLs were further from transcription starting site (TSS) than fine-mapped eQTLs identified in all cells (bulk-eQTLs; =1x10) and more enriched in cell-type-specific enhancers (=3x10), suggesting that sc-eQTLs improve our ability to identify distal eQTLs that are missed in bulk tissues. Overall, the genetic effect of fine-mapped sc-eQTLs were largely shared across cell types, with cell-type-specificity increasing with distance to TSS. Lastly, we observed that sc-eQTLs explain more SNP-heritability ( ) than bulk-eQTLs (9.90 ± 0.88% vs. 6.10 ± 0.76% when meta-analyzed across 16 blood and immune-related traits), improving but not closing the missing link between GWAS and eQTLs. As an example, we highlight that sc-eQTLs in T cells (unlike bulk-eQTLs) can successfully nominate as a candidate gene for rheumatoid arthritis. Overall, jaxQTL provides an efficient and powerful approach using count-based models to identify missing disease-associated eQTLs.
群体规模的单细胞转录组技术(scRNA-seq)能够在细胞水平上表征变异对基因调控的影响(例如,单细胞表达数量性状基因座;sc-eQTLs)。然而,现有的sc-eQTL定位方法要么不是为分析scRNA-seq数据中的稀疏计数而设计的,要么在极大的数据集中会变得难以处理。在这里,我们提出了jaxQTL,这是一个灵活且高效的sc-eQTL定位框架,它使用基于计数的高效模型处理伪批量数据。通过广泛的模拟,我们证明了使用负二项式模型的jaxQTL在识别sc-eQTL方面优于其他模型,同时保持了校准后的I型错误率。我们将jaxQTL应用于OneK1K scRNA-seq数据的14种细胞类型(=982),与现有方法相比,识别出的表达基因(eGenes)多了11%-16%,这主要得益于jaxQTL识别低表达eGenes的能力。我们观察到,精细定位的sc-eQTL比在所有细胞中识别出的精细定位的表达数量性状基因座(批量eQTLs;=1x10)距离转录起始位点(TSS)更远,并且在细胞类型特异性增强子中富集程度更高(=3x10),这表明sc-eQTL提高了我们识别在批量组织中遗漏的远端eQTL的能力。总体而言,精细定位的sc-eQTL的遗传效应在很大程度上在不同细胞类型之间共享,细胞类型特异性随着与TSS距离的增加而增加。最后,我们观察到sc-eQTL比批量eQTL解释了更多单核苷酸多态性遗传力()(对16种血液和免疫相关性状进行荟萃分析时,分别为9.90±0.88%和6.10±0.76%),这改善了但并未弥合全基因组关联研究(GWAS)和eQTL之间的缺失环节。例如,我们强调T细胞中的sc-eQTL(与批量eQTL不同)能够成功地将作为类风湿性关节炎的候选基因提名出来。总体而言,jaxQTL提供了一种使用基于计数的模型来识别缺失的疾病相关eQTL的高效且强大的方法。