Sheth Maya U, Qiu Wei-Lin, Rosa Ma X, Gschwind Andreas R, Jagoda Evelyn, Tan Anthony S, Einarsson Hjörleifur, Gorissen Bram L, Dubocanin Danilo, McGinnis Christopher S, Amgalan Dulguun, Satpathy Ansuman T, Jones Thouis R, Steinmetz Lars M, Kundaje Anshul, Ustun Berk, Engreitz Jesse M, Andersson Robin
bioRxiv. 2024 Nov 24:2024.11.23.624931. doi: 10.1101/2024.11.23.624931.
Mapping enhancers and their target genes in specific cell types is crucial for understanding gene regulation and human disease genetics. However, accurately predicting enhancer-gene regulatory interactions from single-cell datasets has been challenging. Here, we introduce a new family of classification models, scE2G, to predict enhancer-gene regulation. These models use features from single-cell ATAC-seq or multiomic RNA and ATAC-seq data and are trained on a CRISPR perturbation dataset including >10,000 evaluated element-gene pairs. We benchmark scE2G models against CRISPR perturbations, fine-mapped eQTLs, and GWAS variant-gene associations and demonstrate state-of-the-art performance at prediction tasks across multiple cell types and categories of perturbations. We apply scE2G to build maps of enhancer-gene regulatory interactions in heterogeneous tissues and interpret noncoding variants associated with complex traits, nominating regulatory interactions linking and to lymphocyte counts. The scE2G models will enable accurate mapping of enhancer-gene regulatory interactions across thousands of diverse human cell types.
在特定细胞类型中绘制增强子及其靶基因图谱对于理解基因调控和人类疾病遗传学至关重要。然而,从单细胞数据集中准确预测增强子-基因调控相互作用一直具有挑战性。在这里,我们引入了一个新的分类模型家族scE2G,用于预测增强子-基因调控。这些模型使用单细胞ATAC-seq或多组学RNA和ATAC-seq数据的特征,并在一个包含超过10000个评估的元件-基因对的CRISPR扰动数据集上进行训练。我们将scE2G模型与CRISPR扰动、精细定位的eQTL和GWAS变异-基因关联进行基准测试,并在跨多种细胞类型和扰动类别的预测任务中展示了领先的性能。我们应用scE2G构建异质组织中增强子-基因调控相互作用图谱,并解释与复杂性状相关的非编码变异,确定连接[未提及的基因]与淋巴细胞计数的调控相互作用。scE2G模型将能够在数千种不同的人类细胞类型中准确绘制增强子-基因调控相互作用图谱。