Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
Department of Biostatistics, Yale University, New Haven, CT, USA.
Nucleic Acids Res. 2018 May 4;46(8):e50. doi: 10.1093/nar/gky065.
Decoding the spatial organizations of chromosomes has crucial implications for studying eukaryotic gene regulation. Recently, chromosomal conformation capture based technologies, such as Hi-C, have been widely used to uncover the interaction frequencies of genomic loci in a high-throughput and genome-wide manner and provide new insights into the folding of three-dimensional (3D) genome structure. In this paper, we develop a novel manifold learning based framework, called GEM (Genomic organization reconstructor based on conformational Energy and Manifold learning), to reconstruct the three-dimensional organizations of chromosomes by integrating Hi-C data with biophysical feasibility. Unlike previous methods, which explicitly assume specific relationships between Hi-C interaction frequencies and spatial distances, our model directly embeds the neighboring affinities from Hi-C space into 3D Euclidean space. Extensive validations demonstrated that GEM not only greatly outperformed other state-of-art modeling methods but also provided a physically and physiologically valid 3D representations of the organizations of chromosomes. Furthermore, we for the first time apply the modeled chromatin structures to recover long-range genomic interactions missing from original Hi-C data.
解析染色体的空间结构对于研究真核生物基因调控具有重要意义。最近,基于染色体构象捕获的技术,如 Hi-C,已被广泛用于高通量和全基因组范围内揭示基因组位点的相互作用频率,并为三维(3D)基因组结构的折叠提供新的见解。在本文中,我们开发了一种新的基于流形学习的框架,称为 GEM(基于构象能量和流形学习的基因组组织重构器),通过将 Hi-C 数据与生物物理可行性相结合,来重建染色体的三维组织。与之前的方法不同,这些方法明确假设 Hi-C 相互作用频率和空间距离之间存在特定关系,我们的模型直接将来自 Hi-C 空间的邻近亲和力嵌入到 3D 欧几里得空间中。广泛的验证表明,GEM 不仅大大优于其他最先进的建模方法,而且还提供了染色体组织的物理和生理上有效的 3D 表示。此外,我们首次将建模的染色质结构应用于从原始 Hi-C 数据中恢复缺失的长距离基因组相互作用。