Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037.
Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093.
Proc Natl Acad Sci U S A. 2019 Jul 9;116(28):14011-14018. doi: 10.1073/pnas.1901423116. Epub 2019 Jun 24.
Three-dimensional genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe scHiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real single-cell Hi-C data as benchmarks, scHiCluster significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods. After imputation by scHiCluster, topologically associating domain (TAD)-like structures (TLSs) can be identified within single cells, and their consensus boundaries were enriched at the TAD boundaries observed in bulk cell Hi-C samples. In summary, scHiCluster facilitates visualization and comparison of single-cell 3D genomes.
三维基因组结构在基因调控和细胞功能中起着关键作用。使用成像和染色质构象捕获方法(如 Hi-C)已经可以实现单细胞基因组结构分析。为了研究不同细胞类型之间染色体结构的变异,需要能够利用稀疏和异质的单细胞 Hi-C 数据的计算方法。然而,能够准确有效地将此类数据聚类为组成细胞类型的方法很少。在这里,我们描述了 scHiCluster,这是一种用于 Hi-C 接触矩阵的单细胞聚类算法,它基于使用线性卷积和随机游走进行的推断。使用模拟和真实的单细胞 Hi-C 数据作为基准,与现有方法相比,scHiCluster 在应用于低覆盖率数据集时显著提高了聚类准确性。在 scHiCluster 推断之后,可以在单个细胞内识别拓扑关联域 (TAD) 样结构 (TLS),并且它们的共识边界在批量细胞 Hi-C 样本中观察到的 TAD 边界处富集。总之,scHiCluster 促进了单细胞 3D 基因组的可视化和比较。