Suppr超能文献

MaxHiC:一种稳健的背景校正模型,用于识别 Hi-C 中具有生物学相关性的染色质相互作用,并捕获 Hi-C 实验。

MaxHiC: A robust background correction model to identify biologically relevant chromatin interactions in Hi-C and capture Hi-C experiments.

机构信息

Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, The University of Western Australia, Perth, Australia.

Bio Medical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, Australia.

出版信息

PLoS Comput Biol. 2022 Jun 24;18(6):e1010241. doi: 10.1371/journal.pcbi.1010241. eCollection 2022 Jun.

Abstract

Hi-C is a genome-wide chromosome conformation capture technology that detects interactions between pairs of genomic regions and exploits higher order chromatin structures. Conceptually Hi-C data counts interaction frequencies between every position in the genome and every other position. Biologically functional interactions are expected to occur more frequently than transient background and artefactual interactions. To identify biologically relevant interactions, several background models that take biases such as distance, GC content and mappability into account have been proposed. Here we introduce MaxHiC, a background correction tool that deals with these complex biases and robustly identifies statistically significant interactions in both Hi-C and capture Hi-C experiments. MaxHiC uses a negative binomial distribution model and a maximum likelihood technique to correct biases in both Hi-C and capture Hi-C libraries. We systematically benchmark MaxHiC against major Hi-C background correction tools including Hi-C significant interaction callers (SIC) and Hi-C loop callers using published Hi-C, capture Hi-C, and Micro-C datasets. Our results demonstrate that 1) Interacting regions identified by MaxHiC have significantly greater levels of overlap with known regulatory features (e.g. active chromatin histone marks, CTCF binding sites, DNase sensitivity) and also disease-associated genome-wide association SNPs than those identified by currently existing models, 2) the pairs of interacting regions are more likely to be linked by eQTL pairs and 3) more likely to link known regulatory features including known functional enhancer-promoter pairs validated by CRISPRi than any of the existing methods. We also demonstrate that interactions between different genomic region types have distinct distance distributions only revealed by MaxHiC. MaxHiC is publicly available as a python package for the analysis of Hi-C, capture Hi-C and Micro-C data.

摘要

Hi-C 是一种全基因组染色体构象捕获技术,可检测基因组区域对之间的相互作用,并利用高级染色质结构。从概念上讲,Hi-C 数据计算基因组中每个位置与其他每个位置之间的相互作用频率。与瞬时背景和人为相互作用相比,生物功能相互作用预计会更频繁地发生。为了识别生物学上相关的相互作用,已经提出了几种考虑距离、GC 含量和可映射性等偏差的背景模型。在这里,我们介绍了 MaxHiC,这是一种背景校正工具,可以处理这些复杂的偏差,并在 Hi-C 和捕获 Hi-C 实验中稳健地识别具有统计学意义的相互作用。MaxHiC 使用负二项式分布模型和最大似然技术来校正 Hi-C 和捕获 Hi-C 文库中的偏差。我们使用已发表的 Hi-C、捕获 Hi-C 和 Micro-C 数据集,系统地将 MaxHiC 与主要的 Hi-C 背景校正工具(包括 Hi-C 显著相互作用调用程序(SIC)和 Hi-C 环调用程序)进行基准测试。我们的结果表明:1)MaxHiC 识别的相互作用区域与已知的调控特征(如活性染色质组蛋白标记、CTCF 结合位点、DNase 敏感性)以及与疾病相关的全基因组关联 SNPs 具有显著更高的重叠水平,比现有模型识别的区域更多,2)相互作用区域对更有可能通过 eQTL 对连接,3)与已知的调控特征(包括通过 CRISPRi 验证的已知功能增强子-启动子对)连接的可能性比任何现有方法都更高。我们还证明了不同基因组区域类型之间的相互作用具有独特的距离分布,只有 MaxHiC 才能揭示。MaxHiC 作为一个 Python 包,可用于分析 Hi-C、捕获 Hi-C 和 Micro-C 数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7425/9262194/5d61db18b891/pcbi.1010241.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验