系统评价 Hi-C 数据增强方法在增强 PLAC-seq 和 HiChIP 数据中的应用。

A systematic evaluation of Hi-C data enhancement methods for enhancing PLAC-seq and HiChIP data.

机构信息

Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, North Carolina 27599, USA.

State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, 510275 Guangzhou, China.

出版信息

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac145.

DOI:10.1093/bib/bbac145

PMID:35488276

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9116213/

Abstract

The three-dimensional organization of chromatin plays a critical role in gene regulation. Recently developed technologies, such as HiChIP and proximity ligation-assisted ChIP-Seq (PLAC-seq) (hereafter referred to as HP for brevity), can measure chromosome spatial organization by interrogating chromatin interactions mediated by a protein of interest. While offering cost-efficiency over genome-wide unbiased high-throughput chromosome conformation capture (Hi-C) data, HP data remain sparse at kilobase (Kb) resolution with the current sequencing depth in the order of 108 reads per sample. Deep learning models, including HiCPlus, HiCNN, HiCNN2, DeepHiC and Variationally Encoded Hi-C Loss Enhancer (VEHiCLE), have been developed to enhance the sequencing depth of Hi-C data, but their performance on HP data has not been benchmarked. Here, we performed a comprehensive evaluation of HP data sequencing depth enhancement using models developed for Hi-C data. Specifically, we analyzed various HP data, including Smc1a HiChIP data of the human lymphoblastoid cell line GM12878, H3K4me3 PLAC-seq data of four human neural cell types as well as of mouse embryonic stem cells (mESC), and mESC CCCTC-binding factor (CTCF) PLAC-seq data. Our evaluations lead to the following three findings: (i) most models developed for Hi-C data achieve reasonable performance when applied to HP data (e.g. with Pearson correlation ranging 0.76-0.95 for pairs of loci within 300 Kb), and the enhanced datasets lead to improved statistical power for detecting long-range chromatin interactions, (ii) models trained on HP data outperform those trained on Hi-C data and (iii) most models are transferable across cell types. Our results provide a general guideline for HP data enhancement using existing methods designed for Hi-C data.

摘要

染色质的三维组织在基因调控中起着关键作用。最近开发的技术，如 HiChIP 和邻近连接辅助 ChIP-Seq（PLAC-seq）（以下简称 HP 以简洁表示），可以通过检测由感兴趣的蛋白质介导的染色质相互作用来测量染色体空间组织。虽然与基于基因组的无偏高通量染色体构象捕获（Hi-C）数据相比具有成本效益，但 HP 数据在千碱基（Kb）分辨率下仍然稀疏，当前测序深度为每个样本 108 个读数左右。深度学习模型，包括 HiCPlus、HiCNN、HiCNN2、DeepHiC 和变分编码 Hi-C 损耗增强（VEHiCLE），已经被开发出来以增强 Hi-C 数据的测序深度，但它们在 HP 数据上的性能尚未进行基准测试。在这里，我们使用为 Hi-C 数据开发的模型对 HP 数据测序深度增强进行了全面评估。具体来说，我们分析了各种 HP 数据，包括人淋巴母细胞系 GM12878 的 Smc1a HiChIP 数据、四种人类神经细胞类型以及小鼠胚胎干细胞（mESC）的 H3K4me3 PLAC-seq 数据和 mESC CCCTC 结合因子（CTCF）PLAC-seq 数据。我们的评估得出以下三个发现：（i）为 Hi-C 数据开发的大多数模型在应用于 HP 数据时都能达到合理的性能（例如，在 300 Kb 以内的两个基因座之间的 Pearson 相关系数范围为 0.76-0.95），并且增强数据集可提高检测长距离染色质相互作用的统计能力，（ii）在 HP 数据上训练的模型优于在 Hi-C 数据上训练的模型，（iii）大多数模型在细胞类型之间具有可转移性。我们的结果为使用专为 Hi-C 数据设计的现有方法增强 HP 数据提供了一般指导。

相似文献

A systematic evaluation of Hi-C data enhancement methods for enhancing PLAC-seq and HiChIP data.系统评价 Hi-C 数据增强方法在增强 PLAC-seq 和 HiChIP 数据中的应用。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac145.

HPRep: Quantifying Reproducibility in HiChIP and PLAC-Seq Datasets.HPRep：量化 HiChIP 和 PLAC-Seq 数据集的可重复性。

Curr Issues Mol Biol. 2021 Sep 17;43(2):1156-1170. doi: 10.3390/cimb43020082.

Proximity Ligation-Assisted ChIP-Seq (PLAC-Seq).邻近连接辅助染色质免疫沉淀测序（PLAC-Seq）。

Methods Mol Biol. 2021;2351:181-199. doi: 10.1007/978-1-0716-1597-3_10.

HPTAD: A computational method to identify topologically associating domains from HiChIP and PLAC-seq datasets.HPTAD：一种从HiChIP和PLAC-seq数据集中识别拓扑相关结构域的计算方法。

Comput Struct Biotechnol J. 2023 Jan 9;21:931-939. doi: 10.1016/j.csbj.2023.01.003. eCollection 2023.

MAPS: Model-based analysis of long-range chromatin interactions from PLAC-seq and HiChIP experiments.MAPS：基于模型的 PLAC-seq 和 HiChIP 实验中长程染色质相互作用分析。

PLoS Comput Biol. 2019 Apr 15;15(4):e1006982. doi: 10.1371/journal.pcbi.1006982. eCollection 2019 Apr.

HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data.HiCNN：一种非常深的卷积神经网络，可更好地提高 Hi-C 数据的分辨率。

Bioinformatics. 2019 Nov 1;35(21):4222-4228. doi: 10.1093/bioinformatics/btz251.

HiCNN2: Enhancing the Resolution of Hi-C Data Using an Ensemble of Convolutional Neural Networks.HiCNN2：使用卷积神经网络集成提高 Hi-C 数据的分辨率。

Genes (Basel). 2019 Oct 30;10(11):862. doi: 10.3390/genes10110862.

DeepChIA-PET: Accurately predicting ChIA-PET from Hi-C and ChIP-seq with deep dilated networks.DeepChIA-PET：使用深度扩张网络从 Hi-C 和 ChIP-seq 准确预测 ChIA-PET。

PLoS Comput Biol. 2023 Jul 13;19(7):e1011307. doi: 10.1371/journal.pcbi.1011307. eCollection 2023 Jul.

7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs.通过 CTCF 基序的 ChIP-seq 相关性进行计算染色体构象捕获。

BMC Genomics. 2019 Oct 25;20(1):777. doi: 10.1186/s12864-019-6088-0.

Identification of significant chromatin contacts from HiChIP data by FitHiChIP.使用 FitHiChIP 从 HiChIP 数据中识别显著的染色质接触。

Nat Commun. 2019 Sep 17;10(1):4221. doi: 10.1038/s41467-019-11950-y.

引用本文的文献

Comput Struct Biotechnol J. 2023 Jan 9;21:931-939. doi: 10.1016/j.csbj.2023.01.003. eCollection 2023.

Understanding the function of regulatory DNA interactions in the interpretation of non-coding GWAS variants.理解调控性DNA相互作用在非编码全基因组关联研究（GWAS）变异解读中的功能。

Front Cell Dev Biol. 2022 Aug 19;10:957292. doi: 10.3389/fcell.2022.957292. eCollection 2022.

本文引用的文献

HPRep: Quantifying Reproducibility in HiChIP and PLAC-Seq Datasets.HPRep：量化 HiChIP 和 PLAC-Seq 数据集的可重复性。

Curr Issues Mol Biol. 2021 Sep 17;43(2):1156-1170. doi: 10.3390/cimb43020082.

EnHiC: learning fine-resolution Hi-C contact maps using a generative adversarial framework.EnHiC：使用生成对抗框架学习精细分辨率 Hi-C 接触图谱。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i272-i279. doi: 10.1093/bioinformatics/btab272.

VEHiCLE: a Variationally Encoded Hi-C Loss Enhancement algorithm for improving and generating Hi-C data.VEHiCEl：一种基于变分编码的 Hi-C 缺失增强算法，用于改善和生成 Hi-C 数据。

Sci Rep. 2021 Apr 23;11(1):8880. doi: 10.1038/s41598-021-88115-9.

Cell-type-specific 3D epigenomes in the developing human cortex.人类大脑皮层发育过程中的细胞类型特异性三维表观基因组。

Nature. 2020 Nov;587(7835):644-649. doi: 10.1038/s41586-020-2825-4. Epub 2020 Oct 14.

DeepHiC: A generative adversarial network for enhancing Hi-C data resolution.DeepHiC：一种用于提高 Hi-C 数据分辨率的生成对抗网络。

PLoS Comput Biol. 2020 Feb 21;16(2):e1007287. doi: 10.1371/journal.pcbi.1007287. eCollection 2020 Feb.

HiCNN2: Enhancing the Resolution of Hi-C Data Using an Ensemble of Convolutional Neural Networks.HiCNN2：使用卷积神经网络集成提高 Hi-C 数据的分辨率。

Genes (Basel). 2019 Oct 30;10(11):862. doi: 10.3390/genes10110862.

hicGAN infers super resolution Hi-C data with generative adversarial networks.hicGAN 利用生成对抗网络对超高分辨率 Hi-C 数据进行推断。

Bioinformatics. 2019 Jul 15;35(14):i99-i107. doi: 10.1093/bioinformatics/btz317.

The ENCODE Blacklist: Identification of Problematic Regions of the Genome.ENCODE 黑名单：基因组中问题区域的鉴定。

Sci Rep. 2019 Jun 27;9(1):9354. doi: 10.1038/s41598-019-45839-z.

HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data.HiCNN：一种非常深的卷积神经网络，可更好地提高 Hi-C 数据的分辨率。

Bioinformatics. 2019 Nov 1;35(21):4222-4228. doi: 10.1093/bioinformatics/btz251.

MAPS: Model-based analysis of long-range chromatin interactions from PLAC-seq and HiChIP experiments.MAPS：基于模型的 PLAC-seq 和 HiChIP 实验中长程染色质相互作用分析。

PLoS Comput Biol. 2019 Apr 15;15(4):e1006982. doi: 10.1371/journal.pcbi.1006982. eCollection 2019 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验