IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1448-1458. doi: 10.1109/TCBB.2018.2851200. Epub 2018 Jun 28.
Recent studies involving the 3-dimensional conformation of chromatin have revealed the important role it has to play in different processes within the cell. These studies have also led to the discovery of densely interacting segments of the chromosome, called topologically associating domains. The accurate identification of these domains from Hi-C interaction data is an interesting and important computational problem for which numerous methods have been proposed. Unfortunately, most existing algorithms designed to identify these domains assume that they are non-overlapping whereas there is substantial evidence to believe a nested structure exists. We present a methodology to predict hierarchical chromatin domains using chromatin conformation capture data. Our method predicts domains at different resolutions, calculated using intrinsic properties of the chromatin data, and effectively clusters these to construct the hierarchy. At each individual level, the domains are non-overlapping in such a way that the intra-domain interaction frequencies are maximized. We show that our predicted structure is highly enriched for actively transcribing housekeeping genes and various chromatin markers, including CTCF, around the domain boundaries. We also show that large-scale domains, at multiple resolutions within our hierarchy, are conserved across cell types and species. We also provide comparisons against existing tools for extracting hierarchical domains. Our software, Matryoshka, is written in C++11 and licensed under GPL v3; it is available at https://github.com/COMBINE-lab/matryoshka.
最近涉及染色质三维构象的研究揭示了它在细胞内不同过程中所起的重要作用。这些研究还导致了发现了称为拓扑关联域的染色体紧密相互作用的片段。从 Hi-C 相互作用数据中准确识别这些结构域是一个有趣且重要的计算问题,已经提出了许多方法。不幸的是,大多数旨在识别这些结构域的现有算法都假设它们是不重叠的,而有大量证据表明存在嵌套结构。我们提出了一种使用染色质构象捕获数据预测层次化染色质结构域的方法。我们的方法使用染色质数据的固有特性来预测不同分辨率的结构域,并有效地对这些结构域进行聚类以构建层次结构。在每个单独的级别上,这些结构域都是不重叠的,使得结构域内的相互作用频率最大化。我们表明,我们预测的结构域对于活跃转录的管家基因和各种染色质标记(包括边界处的 CTCF)高度富集。我们还表明,在我们的层次结构内的多个分辨率上的大规模结构域在细胞类型和物种之间是保守的。我们还提供了与现有提取层次结构域的工具的比较。我们的软件 Matryoshka 是用 C++11 编写的,并根据 GPL v3 获得许可;它可以在 https://github.com/COMBINE-lab/matryoshka 上获得。