Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing, China 100124.
Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac168.
The three-dimensional (3D) chromosomal structure plays an essential role in all DNA-templated processes, including gene transcription, DNA replication and other cellular processes. Although developing chromosome conformation capture (3C) methods, such as Hi-C, which can generate chromosomal contact data characterized genome-wide chromosomal structural properties, understanding 3D genomic nature-based on Hi-C data remains lacking. Here, we propose a persistent spectral simplicial complex (PerSpectSC) model to describe Hi-C data for the first time. Specifically, a filtration process is introduced to generate a series of nested simplicial complexes at different scales. For each of these simplicial complexes, its spectral information can be calculated from the corresponding Hodge Laplacian matrix. PerSpectSC model describes the persistence and variation of the spectral information of the nested simplicial complexes during the filtration process. Different from all previous models, our PerSpectSC-based features provide a quantitative global-scale characterization of chromosome structures and topology. Our descriptors can successfully classify cell types and also cellular differentiation stages for all the 24 types of chromosomes simultaneously. In particular, persistent minimum best characterizes cell types and Dim (1) persistent multiplicity best characterizes cellular differentiation. These results demonstrate the great potential of our PerSpectSC-based models in polymeric data analysis.
三维(3D)染色体结构在所有基于 DNA 的过程中都起着至关重要的作用,包括基因转录、DNA 复制和其他细胞过程。尽管开发了染色体构象捕获(3C)方法,如 Hi-C,可以生成具有全基因组染色体结构特征的染色体接触数据,但基于 Hi-C 数据理解 3D 基因组性质仍然存在不足。在这里,我们首次提出了持久谱单纯复形(PerSpectSC)模型来描述 Hi-C 数据。具体来说,引入了过滤过程来生成一系列嵌套的单纯复形,这些复形在不同的尺度上。对于每个单纯复形,可以从相应的 Hodge Laplacian 矩阵计算其谱信息。PerSpectSC 模型描述了嵌套单纯复形在过滤过程中谱信息的持久性和变化。与之前的所有模型不同,我们基于 PerSpectSC 的特征提供了对染色体结构和拓扑的定量全局特征描述。我们的描述符可以成功地对所有 24 种染色体的细胞类型和细胞分化阶段进行分类。特别是,持久最小特征可以很好地区分细胞类型,而一维持久多重性特征可以很好地区分细胞分化。这些结果表明,我们基于 PerSpectSC 的模型在聚合物数据分析中具有巨大的潜力。