School of Information and Control Engineering, Xi'an University of Architecture and Technology, Xi'an 710055, China.
School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China.
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae138.
Topologically associating domains (TADs) are fundamental building blocks of 3D genome. TAD-like domains in single cells are regarded as the underlying genesis of TADs discovered in bulk cells. Understanding the organization of TAD-like domains helps to get deeper insights into their regulatory functions. Unfortunately, it remains a challenge to identify TAD-like domains on single-cell Hi-C data due to its ultra-sparsity.
We propose scKTLD, an in silico tool for the identification of TAD-like domains on single-cell Hi-C data. It takes Hi-C contact matrix as the adjacency matrix for a graph, embeds the graph structures into a low-dimensional space with the help of sparse matrix factorization followed by spectral propagation, and the TAD-like domains can be identified using a kernel-based changepoint detection in the embedding space. The results tell that our scKTLD is superior to the other methods on the sparse contact matrices, including downsampled bulk Hi-C data as well as simulated and experimental single-cell Hi-C data. Besides, we demonstrated the conservation of TAD-like domain boundaries at single-cell level apart from heterogeneity within and across cell types, and found that the boundaries with higher frequency across single cells are more enriched for architectural proteins and chromatin marks, and they preferentially occur at TAD boundaries in bulk cells, especially at those with higher hierarchical levels.
scKTLD is freely available at https://github.com/lhqxinghun/scKTLD.
拓扑关联域(TADs)是三维基因组的基本构建块。单细胞中的 TAD 样结构被认为是在大量细胞中发现的 TAD 的潜在起源。了解 TAD 样结构的组织有助于更深入地了解它们的调控功能。然而,由于单细胞 Hi-C 数据的超高稀疏性,识别单细胞 Hi-C 数据中的 TAD 样结构仍然是一个挑战。
我们提出了 scKTLD,这是一种用于识别单细胞 Hi-C 数据中 TAD 样结构的计算工具。它将 Hi-C 接触矩阵作为图的邻接矩阵,通过稀疏矩阵分解和谱传播将图结构嵌入到低维空间中,然后在嵌入空间中使用基于核的变点检测来识别 TAD 样结构。结果表明,我们的 scKTLD 在稀疏接触矩阵上优于其他方法,包括下采样的 bulk Hi-C 数据以及模拟和实验单细胞 Hi-C 数据。此外,我们证明了 TAD 样结构边界在单细胞水平上的保守性,除了细胞内和细胞间的异质性,并且发现跨越单细胞的边界频率更高的边界富含结构蛋白和染色质标记,并且它们优先发生在 bulk 细胞的 TAD 边界处,尤其是那些具有更高层次结构的边界处。
scKTLD 可在 https://github.com/lhqxinghun/scKTLD 上免费获得。