Suppr超能文献

CoRE-ATAC:一种从单细胞和批量 ATAC-seq 数据中对调控元件进行功能分类的深度学习模型。

CoRE-ATAC: A deep learning model for the functional classification of regulatory elements from single cell and bulk ATAC-seq data.

机构信息

The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America.

The Jackson Laboratory, Bar Harbor, Maine, United States of America.

出版信息

PLoS Comput Biol. 2021 Dec 13;17(12):e1009670. doi: 10.1371/journal.pcbi.1009670. eCollection 2021 Dec.

Abstract

Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.

摘要

顺式调控元件(cis-REs)包括启动子、增强子和绝缘子,它们通过转录因子的结合来调节基因表达程序。ATAC-seq 技术通过在碱基对分辨率下绘制可及染色质,有效地识别给定细胞类型(包括单细胞)中的活性 cis-REs。然而,这些图谱并不能立即用于推断 cis-REs 的特定功能。为此,我们开发了一种深度学习框架(CoRE-ATAC),该框架具有新颖的数据编码器,可将 DNA 序列(参考或个人基因型)与 ATAC-seq 切割位点和读取堆积物集成。CoRE-ATAC 基于 4 种细胞类型(n = 6 个样本/重复)进行训练,并从 7 种未用于模型训练的细胞类型(n = 40 个样本)中准确预测了已知的 cis-RE 功能(平均准确率 = 0.80,平均 F1 分数 = 0.70)。从 19 个人胰岛样本中预测的 CoRE-ATAC 增强子与遗传调节的增强子活性的获得/丧失一致,这通过大规模平行报告基因检测(MPRAs)得到了证实。最后,CoRE-ATAC 有效地从来自人类血液衍生免疫细胞的聚合单核核 ATAC-seq(snATAC)数据推断 cis-RE 功能,这些数据与分选免疫细胞中的已知功能注释重叠,从而证明了这些模型在无需细胞分选的情况下研究稀有细胞 cis-RE 功能的有效性。来自原代人类细胞的 ATAC-seq 图谱揭示了 cis-RE 活性的个体和细胞特异性变化。CoRE-ATAC 提高了这些图谱的功能分辨率,这是研究疾病背后的调控失调的关键步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b090/8699717/2fcb7be45419/pcbi.1009670.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验