通过对 Hi-C 数据的离群点建模推断 3D 基因组结构。

Inference of 3D genome architecture by modeling overdispersion of Hi-C data.

机构信息

TIMC, Université Grenoble Alpes, CNRS, Grenoble INP, Grenoble 38000, France.

Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.

出版信息

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac838.

DOI:10.1093/bioinformatics/btac838

PMID:36594573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9857972/

Abstract

MOTIVATION

We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two-step algorithm: first, convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data.

RESULTS

We first confirm the presence of overdispersion in several real Hi-C datasets, and we show that the overdispersion arises even in simulated datasets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms, both MDS-based and statistical methods. We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions.

AVAILABILITY AND IMPLEMENTATION

A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD license.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

我们解决了从 Hi-C 数据推断基因组结构共识 3D 模型的挑战。现有的方法大多依赖于两步算法：首先，将接触计数转换为距离，然后优化类似于多维尺度（MDS）的目标函数来推断 3D 模型。其他方法使用最大似然方法，将两个位置之间的接触计数建模为泊松随机变量，其强度是它们之间距离的递减函数。然而，泊松接触计数模型意味着数据的方差等于均值，这种关系通常过于严格，无法正确地对计数数据进行建模。

结果

我们首先在几个真实的 Hi-C 数据集上确认了过离散度的存在，并表明即使在模拟数据集中也存在过离散度。然后，我们提出了一个新的模型，称为 Pastis-NB，其中我们用负二项式模型替代了接触计数的泊松模型，该模型由均值和单独的离散参数参数化。离散参数允许方差独立于均值进行调整，从而更好地对过分散数据进行建模。我们将 Pastis-NB 的结果与以前发表的几种算法进行了比较，包括基于 MDS 和统计方法的算法。我们表明，负二项式推断在模拟数据上产生了更准确的结构，并且在真实的 Hi-C 重复和不同分辨率下比其他模型具有更稳健的结构。

可用性和实现

Pastis-NB 的 Python 实现可在 https://github.com/hiclib/pastis 下获得，遵循 BSD 许可证。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/38b5/9857972/07e42c9ad7fa/btac838f1.jpg

相似文献

Inference of 3D genome architecture by modeling overdispersion of Hi-C data.通过对 Hi-C 数据的离群点建模推断 3D 基因组结构。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac838.

A statistical approach for inferring the 3D structure of the genome.一种推断基因组 3D 结构的统计方法。

Bioinformatics. 2014 Jun 15;30(12):i26-33. doi: 10.1093/bioinformatics/btu268.

miniMDS: 3D structural inference from high-resolution Hi-C data.miniMDS：从高分辨率 Hi-C 数据推断 3D 结构。

Bioinformatics. 2017 Jul 15;33(14):i261-i266. doi: 10.1093/bioinformatics/btx271.

A maximum likelihood algorithm for reconstructing 3D structures of human chromosomes from chromosomal contact data.从染色体接触数据中重建人类染色体 3D 结构的最大似然算法。

BMC Genomics. 2018 Feb 23;19(1):161. doi: 10.1186/s12864-018-4546-8.

Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments.基于 RNA-seq 实验的时空计数数据的非参数建模。

Bioinformatics. 2021 Nov 5;37(21):3788-3795. doi: 10.1093/bioinformatics/btab486.

Approaches for dealing with various sources of overdispersion in modeling count data: Scale adjustment versus modeling.处理计数数据建模中各种过度分散来源的方法：尺度调整与建模。

Stat Methods Med Res. 2017 Aug;26(4):1802-1823. doi: 10.1177/0962280215588569. Epub 2015 May 31.

On performance of parametric and distribution-free models for zero-inflated and over-dispersed count responses.关于零膨胀和过度分散计数响应的参数模型和非参数模型的性能。

Stat Med. 2015 Oct 30;34(24):3235-45. doi: 10.1002/sim.6560. Epub 2015 Jun 15.

HiCRep.py: fast comparison of Hi-C contact matrices in Python.HiCRep.py：用 Python 快速比较 Hi-C 接触矩阵。

Bioinformatics. 2021 Sep 29;37(18):2996-2997. doi: 10.1093/bioinformatics/btab097.

Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization.使用负二项式非负矩阵分解进行突变特征的模型选择和稳健推断。

BMC Bioinformatics. 2023 May 8;24(1):187. doi: 10.1186/s12859-023-05304-1.

Extending partial haplotypes to full genome haplotypes using chromosome conformation capture data.利用染色体构象捕获数据将部分单倍型扩展为全基因组单倍型。

Bioinformatics. 2016 Sep 1;32(17):i559-i566. doi: 10.1093/bioinformatics/btw453.

引用本文的文献

STATISTICAL CURVE MODELS FOR INFERRING 3D CHROMATIN ARCHITECTURE.用于推断三维染色质结构的统计曲线模型

Ann Appl Stat. 2024 Dec;18(4):2979-3006. doi: 10.1214/24-AOAS1917. Epub 2024 Oct 31.

The 3D genome of plasma cells in multiple myeloma.多发性骨髓瘤中浆细胞的三维基因组

Sci Rep. 2025 Jun 2;15(1):19331. doi: 10.1038/s41598-025-03132-2.

ARGV: 3D genome structure exploration using augmented reality.使用增强现实技术探索 3D 基因组结构。

BMC Bioinformatics. 2024 Aug 27;25(1):277. doi: 10.1186/s12859-024-05882-8.

BaRDIC: robust peak calling for RNA-DNA interaction data.BaRDIC：用于RNA-DNA相互作用数据的稳健峰检测

NAR Genom Bioinform. 2024 May 20;6(2):lqae054. doi: 10.1093/nargab/lqae054. eCollection 2024 Jun.

Posterior inference of Hi-C contact frequency through sampling.通过采样对Hi-C接触频率进行后验推断。

Front Bioinform. 2024 Feb 22;3:1285828. doi: 10.3389/fbinf.2023.1285828. eCollection 2023.

3D models of fungal chromosomes to enhance visual integration of omics data.用于增强组学数据可视化整合的真菌染色体三维模型。

NAR Genom Bioinform. 2023 Dec 5;5(4):lqad104. doi: 10.1093/nargab/lqad104. eCollection 2023 Dec.

EVRC: reconstruction of chromosome 3D structure models using error-vector resultant algorithm with clustering coefficient.EVRC：利用误差向量和算法以及聚类系数重建染色体 3D 结构模型。

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad638.

Efficient Hi-C inversion facilitates chromatin folding mechanism discovery and structure prediction.高效的 Hi-C 反转有助于染色质折叠机制的发现和结构预测。

Biophys J. 2023 Sep 5;122(17):3425-3438. doi: 10.1016/j.bpj.2023.07.017. Epub 2023 Jul 26.

Efficient Hi-C inversion facilitates chromatin folding mechanism discovery and structure prediction.高效的Hi-C反演有助于染色质折叠机制的发现和结构预测。

bioRxiv. 2023 Jul 21:2023.03.17.533194. doi: 10.1101/2023.03.17.533194.

ZipHiC: a novel Bayesian framework to identify enriched interactions and experimental biases in Hi-C data.ZipHiC：一种用于鉴定 Hi-C 数据中富集互作和实验偏差的新型贝叶斯框架。

Bioinformatics. 2022 Jul 11;38(14):3523-3531. doi: 10.1093/bioinformatics/btac387.

本文引用的文献

Chromatin 3D structure reconstruction with consideration of adjacency relationship among genomic loci.考虑基因组位点邻近距离的染色质 3D 结构重构

BMC Bioinformatics. 2020 Jul 1;21(1):272. doi: 10.1186/s12859-020-03612-4.

Large-scale 3D chromatin reconstruction from chromosomal contacts.大规模 3D 染色质构象从染色体构象接触重建。

BMC Genomics. 2019 Apr 4;20(Suppl 2):186. doi: 10.1186/s12864-019-5470-2.

Chromatin 3D Reconstruction from Chromosomal Contacts Using a Genetic Algorithm.利用遗传算法从染色体接触中重建染色质 3D 结构。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1620-1626. doi: 10.1109/TCBB.2018.2814995. Epub 2018 Mar 20.

Unfolding the Genome: The Case Study of P. falciparum.解析疟原虫基因组：恶性疟原虫案例研究

Int J Biostat. 2018 Jun 7;15(2):ijb-2017-0061. doi: 10.1515/ijb-2017-0061.

Reconstructing spatial organizations of chromosomes through manifold learning.通过流形学习重建染色体的空间结构。

Nucleic Acids Res. 2018 May 4;46(8):e50. doi: 10.1093/nar/gky065.

miniMDS: 3D structural inference from high-resolution Hi-C data.miniMDS：从高分辨率 Hi-C 数据推断 3D 结构。

Bioinformatics. 2017 Jul 15;33(14):i261-i266. doi: 10.1093/bioinformatics/btx271.

An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data.一种从高分辨率 Hi-C 数据中检测显著染色质相互作用的集成模型。

Nat Commun. 2017 May 17;8:15454. doi: 10.1038/ncomms15454.

3D Genome Reconstruction with ShRec3D+ and Hi-C Data.使用 ShRec3D+ 和 Hi-C 数据进行 3D 基因组重建。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):460-468. doi: 10.1109/TCBB.2016.2535372. Epub 2016 Feb 29.

Bipartite structure of the inactive mouse X chromosome.失活小鼠X染色体的二分结构。

Genome Biol. 2015 Aug 7;16(1):152. doi: 10.1186/s13059-015-0728-8.

A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.一份碱基对分辨率的人类基因组三维图谱揭示了染色质环化的原理。

Cell. 2014 Dec 18;159(7):1665-80. doi: 10.1016/j.cell.2014.11.021. Epub 2014 Dec 11.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过对 Hi-C 数据的离群点建模推断 3D 基因组结构。

Inference of 3D genome architecture by modeling overdispersion of Hi-C data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献