基于序列的多尺度建模用于高通量染色体构象捕获（Hi-C）数据分析。

Sequence-based multiscale modeling for high-throughput chromosome conformation capture (Hi-C) data analysis.

作者信息

Xia Kelin

机构信息

Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.

School of Biological Sciences, Nanyang Technological University, Singapore 637371, Singapore.

出版信息

PLoS One. 2018 Feb 6;13(2):e0191899. doi: 10.1371/journal.pone.0191899. eCollection 2018.

DOI:10.1371/journal.pone.0191899

PMID:29408904

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5800693/

Abstract

In this paper, we introduce sequence-based multiscale modeling for biomolecular data analysis. We employ spectral clustering method in our modeling and reveal the difference between sequence-based global scale clustering and local scale clustering. Essentially, two types of distances, i.e., Euclidean (or spatial) distance and genomic (or sequential) distance, can be used in data clustering. Clusters from sequence-based global scale models optimize spatial distances, meaning spatially adjacent loci are more likely to be assigned into the same cluster. Sequence-based local scale models, on the other hand, result in clusters that optimize genomic distances. That is to say, in these models, sequentially adjoining loci tend to be cluster together. We propose two sequence-based multiscale models (SeqMMs) for the study of chromosome hierarchical structures, including genomic compartments and topological associated domains (TADs). We find that genomic compartments are determined only by global scale information in the Hi-C data. The removal of all the local interactions within a band region as large as 10 Mb in genomic distance has almost no significant influence on the final compartment results. Further, in TAD analysis, we find that when the sequential scale is small, a tiny variation of diagonal band region in a contact map will result in a great change in the predicted TAD boundaries. When the scale value is larger than a threshold value, the TAD boundaries become very consistent. This threshold value is highly related to TAD sizes. By the comparison of our results with those previously obtained using a spectral clustering model, we find that our method is more robust and reliable. Finally, we demonstrate that almost all TAD boundaries from both clustering methods are local minimum of a TAD summation function.

摘要

在本文中，我们介绍了用于生物分子数据分析的基于序列的多尺度建模。我们在建模中采用谱聚类方法，并揭示了基于序列的全局尺度聚类和局部尺度聚类之间的差异。本质上，数据聚类中可以使用两种类型的距离，即欧几里得（或空间）距离和基因组（或序列）距离。基于序列的全局尺度模型的聚类优化空间距离，这意味着空间上相邻的位点更有可能被分配到同一聚类中。另一方面，基于序列的局部尺度模型产生的聚类优化基因组距离。也就是说，在这些模型中，顺序相邻的位点倾向于聚集在一起。我们提出了两种基于序列的多尺度模型（SeqMMs）用于研究染色体层次结构，包括基因组区室和拓扑相关结构域（TADs）。我们发现基因组区室仅由Hi-C数据中的全局尺度信息决定。在基因组距离高达10 Mb的条带区域内去除所有局部相互作用对最终的区室结果几乎没有显著影响。此外，在TAD分析中，我们发现当序列尺度较小时，接触图中对角带区域的微小变化将导致预测的TAD边界发生很大变化。当尺度值大于阈值时，TAD边界变得非常一致。这个阈值与TAD大小高度相关。通过将我们的结果与之前使用谱聚类模型获得的结果进行比较，我们发现我们的方法更稳健、更可靠。最后，我们证明了两种聚类方法得到的几乎所有TAD边界都是TAD求和函数的局部最小值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db23/5800693/d5c2577c23da/pone.0191899.g001.jpg

相似文献

Sequence-based multiscale modeling for high-throughput chromosome conformation capture (Hi-C) data analysis.基于序列的多尺度建模用于高通量染色体构象捕获（Hi-C）数据分析。

PLoS One. 2018 Feb 6;13(2):e0191899. doi: 10.1371/journal.pone.0191899. eCollection 2018.

[An identification method of chromatin topological associated domains based on spatial density clustering].基于空间密度聚类的染色质拓扑相关结构域识别方法

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Jun 25;41(3):552-559. doi: 10.7507/1001-5515.202311059.

ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.ClusterTAD：一种从Hi-C数据中检测染色体拓扑相关结构域的无监督机器学习方法。

BMC Bioinformatics. 2017 Nov 14;18(1):480. doi: 10.1186/s12859-017-1931-2.

CASPIAN: A method to identify chromatin topological associated domains based on spatial density cluster.CASPIAN：一种基于空间密度聚类识别染色质拓扑相关结构域的方法。

Comput Struct Biotechnol J. 2022 Sep 5;20:4816-4824. doi: 10.1016/j.csbj.2022.08.059. eCollection 2022.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A Comparison of Topologically Associating Domain Callers Based on Hi-C Data.基于 Hi-C 数据的拓扑关联域调用器比较。

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):15-29. doi: 10.1109/TCBB.2022.3147805. Epub 2023 Feb 3.

A comparison of topologically associating domain callers over mammals at high resolution.在高分辨率下比较哺乳动物的拓扑关联结构域调用器。

BMC Bioinformatics. 2022 Apr 12;23(1):127. doi: 10.1186/s12859-022-04674-2.

A unified framework for inferring the multi-scale organization of chromatin domains from Hi-C.一种从Hi-C推断染色质结构域多尺度组织的统一框架。

PLoS Comput Biol. 2021 Mar 16;17(3):e1008834. doi: 10.1371/journal.pcbi.1008834. eCollection 2021 Mar.

The effects of common structural variants on 3D chromatin structure.常见结构变异对 3D 染色质结构的影响。

BMC Genomics. 2020 Jan 30;21(1):95. doi: 10.1186/s12864-020-6516-1.

SpectralTAD: an R package for defining a hierarchy of topologically associated domains using spectral clustering.SpectralTAD：一个使用谱聚类定义层次结构拓扑关联域的 R 包。

BMC Bioinformatics. 2020 Jul 20;21(1):319. doi: 10.1186/s12859-020-03652-w.

引用本文的文献

Spatial and Sequential Topological Analysis of Molecular Dynamics Simulations of IgG1 Fc Domains.IgG1 Fc结构域分子动力学模拟的空间与序列拓扑分析

J Chem Theory Comput. 2025 May 13;21(9):4884-4897. doi: 10.1021/acs.jctc.5c00161. Epub 2025 Apr 22.

Hodge theory-based biomolecular data analysis.基于 Hodge 理论的生物分子数据分析。

Sci Rep. 2022 Jun 11;12(1):9699. doi: 10.1038/s41598-022-12877-z.

BHi-Cect: a top-down algorithm for identifying the multi-scale hierarchical structure of chromosomes.BHi-Cect：一种自上而下的算法，用于识别染色体的多尺度层次结构。

Nucleic Acids Res. 2020 Mar 18;48(5):e26. doi: 10.1093/nar/gkaa004.

本文引用的文献

Multiscale virtual particle based elastic network model (MVP-ENM) for normal mode analysis of large-sized biomolecules.用于大型生物分子正常模式分析的基于多尺度虚拟粒子的弹性网络模型（MVP-ENM）

Phys Chem Chem Phys. 2017 Dec 20;20(1):658-669. doi: 10.1039/c7cp07177a.

Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology.通过元素特定的持久同调分析和预测突变时蛋白质折叠能量的变化。

Bioinformatics. 2017 Nov 15;33(22):3549-3557. doi: 10.1093/bioinformatics/btx460.

TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions.拓扑网络：用于生物分子性质预测的基于拓扑的深度卷积和多任务神经网络。

PLoS Comput Biol. 2017 Jul 27;13(7):e1005690. doi: 10.1371/journal.pcbi.1005690. eCollection 2017 Jul.

Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction.用于蛋白质-配体结合亲和力预测的元素特异性持久同调与机器学习的整合

Int J Numer Method Biomed Eng. 2018 Feb;34(2). doi: 10.1002/cnm.2914. Epub 2017 Aug 16.

Organization and function of the 3D genome.三维基因组的组织与功能。

Nat Rev Genet. 2016 Oct 14;17(11):661-678. doi: 10.1038/nrg.2016.112.

Genome-wide mapping and analysis of chromosome architecture.全基因组染色体结构图谱绘制与分析

Nat Rev Mol Cell Biol. 2016 Dec;17(12):743-755. doi: 10.1038/nrm.2016.104. Epub 2016 Sep 1.

Spectral identification of topological domains.拓扑域的光谱识别。

Bioinformatics. 2016 Jul 15;32(14):2151-8. doi: 10.1093/bioinformatics/btw221. Epub 2016 May 5.

Multiscale Gaussian network model (mGNM) and multiscale anisotropic network model (mANM).多尺度高斯网络模型（mGNM）和多尺度各向异性网络模型（mANM）。

J Chem Phys. 2015 Nov 28;143(20):204106. doi: 10.1063/1.4936132.

Modeling chromosomes: Beyond pretty pictures.染色体建模：超越漂亮图片

FEBS Lett. 2015 Oct 7;589(20 Pt A):3031-6. doi: 10.1016/j.febslet.2015.09.004. Epub 2015 Sep 10.

Multiresolution Topological Simplification.多分辨率拓扑简化

J Comput Biol. 2015 Sep;22(9):887-91. doi: 10.1089/cmb.2015.0104. Epub 2015 Jul 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于序列的多尺度建模用于高通量染色体构象捕获（Hi-C）数据分析。

Sequence-based multiscale modeling for high-throughput chromosome conformation capture (Hi-C) data analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献