Suppr超能文献

系统评价 Hi-C 数据增强方法在增强 PLAC-seq 和 HiChIP 数据中的应用。

A systematic evaluation of Hi-C data enhancement methods for enhancing PLAC-seq and HiChIP data.

机构信息

Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, North Carolina 27599, USA.

State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, 510275 Guangzhou, China.

出版信息

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac145.

Abstract

The three-dimensional organization of chromatin plays a critical role in gene regulation. Recently developed technologies, such as HiChIP and proximity ligation-assisted ChIP-Seq (PLAC-seq) (hereafter referred to as HP for brevity), can measure chromosome spatial organization by interrogating chromatin interactions mediated by a protein of interest. While offering cost-efficiency over genome-wide unbiased high-throughput chromosome conformation capture (Hi-C) data, HP data remain sparse at kilobase (Kb) resolution with the current sequencing depth in the order of 108 reads per sample. Deep learning models, including HiCPlus, HiCNN, HiCNN2, DeepHiC and Variationally Encoded Hi-C Loss Enhancer (VEHiCLE), have been developed to enhance the sequencing depth of Hi-C data, but their performance on HP data has not been benchmarked. Here, we performed a comprehensive evaluation of HP data sequencing depth enhancement using models developed for Hi-C data. Specifically, we analyzed various HP data, including Smc1a HiChIP data of the human lymphoblastoid cell line GM12878, H3K4me3 PLAC-seq data of four human neural cell types as well as of mouse embryonic stem cells (mESC), and mESC CCCTC-binding factor (CTCF) PLAC-seq data. Our evaluations lead to the following three findings: (i) most models developed for Hi-C data achieve reasonable performance when applied to HP data (e.g. with Pearson correlation ranging 0.76-0.95 for pairs of loci within 300 Kb), and the enhanced datasets lead to improved statistical power for detecting long-range chromatin interactions, (ii) models trained on HP data outperform those trained on Hi-C data and (iii) most models are transferable across cell types. Our results provide a general guideline for HP data enhancement using existing methods designed for Hi-C data.

摘要

染色质的三维组织在基因调控中起着关键作用。最近开发的技术,如 HiChIP 和邻近连接辅助 ChIP-Seq(PLAC-seq)(以下简称 HP 以简洁表示),可以通过检测由感兴趣的蛋白质介导的染色质相互作用来测量染色体空间组织。虽然与基于基因组的无偏高通量染色体构象捕获(Hi-C)数据相比具有成本效益,但 HP 数据在千碱基(Kb)分辨率下仍然稀疏,当前测序深度为每个样本 108 个读数左右。深度学习模型,包括 HiCPlus、HiCNN、HiCNN2、DeepHiC 和变分编码 Hi-C 损耗增强(VEHiCLE),已经被开发出来以增强 Hi-C 数据的测序深度,但它们在 HP 数据上的性能尚未进行基准测试。在这里,我们使用为 Hi-C 数据开发的模型对 HP 数据测序深度增强进行了全面评估。具体来说,我们分析了各种 HP 数据,包括人淋巴母细胞系 GM12878 的 Smc1a HiChIP 数据、四种人类神经细胞类型以及小鼠胚胎干细胞(mESC)的 H3K4me3 PLAC-seq 数据和 mESC CCCTC 结合因子(CTCF)PLAC-seq 数据。我们的评估得出以下三个发现:(i)为 Hi-C 数据开发的大多数模型在应用于 HP 数据时都能达到合理的性能(例如,在 300 Kb 以内的两个基因座之间的 Pearson 相关系数范围为 0.76-0.95),并且增强数据集可提高检测长距离染色质相互作用的统计能力,(ii)在 HP 数据上训练的模型优于在 Hi-C 数据上训练的模型,(iii)大多数模型在细胞类型之间具有可转移性。我们的结果为使用专为 Hi-C 数据设计的现有方法增强 HP 数据提供了一般指导。

相似文献

本文引用的文献

5
DeepHiC: A generative adversarial network for enhancing Hi-C data resolution.DeepHiC:一种用于提高 Hi-C 数据分辨率的生成对抗网络。
PLoS Comput Biol. 2020 Feb 21;16(2):e1007287. doi: 10.1371/journal.pcbi.1007287. eCollection 2020 Feb.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验