The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America.
The Jackson Laboratory, Bar Harbor, Maine, United States of America.
PLoS Comput Biol. 2021 Dec 13;17(12):e1009670. doi: 10.1371/journal.pcbi.1009670. eCollection 2021 Dec.
Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.
顺式调控元件(cis-REs)包括启动子、增强子和绝缘子,它们通过转录因子的结合来调节基因表达程序。ATAC-seq 技术通过在碱基对分辨率下绘制可及染色质,有效地识别给定细胞类型(包括单细胞)中的活性 cis-REs。然而,这些图谱并不能立即用于推断 cis-REs 的特定功能。为此,我们开发了一种深度学习框架(CoRE-ATAC),该框架具有新颖的数据编码器,可将 DNA 序列(参考或个人基因型)与 ATAC-seq 切割位点和读取堆积物集成。CoRE-ATAC 基于 4 种细胞类型(n = 6 个样本/重复)进行训练,并从 7 种未用于模型训练的细胞类型(n = 40 个样本)中准确预测了已知的 cis-RE 功能(平均准确率 = 0.80,平均 F1 分数 = 0.70)。从 19 个人胰岛样本中预测的 CoRE-ATAC 增强子与遗传调节的增强子活性的获得/丧失一致,这通过大规模平行报告基因检测(MPRAs)得到了证实。最后,CoRE-ATAC 有效地从来自人类血液衍生免疫细胞的聚合单核核 ATAC-seq(snATAC)数据推断 cis-RE 功能,这些数据与分选免疫细胞中的已知功能注释重叠,从而证明了这些模型在无需细胞分选的情况下研究稀有细胞 cis-RE 功能的有效性。来自原代人类细胞的 ATAC-seq 图谱揭示了 cis-RE 活性的个体和细胞特异性变化。CoRE-ATAC 提高了这些图谱的功能分辨率,这是研究疾病背后的调控失调的关键步骤。