MOE Key Laboratory of Bioinformatics, Division of Bioinformatics, BNRIST and Department of Automation, Tsinghua University, Beijing 100084, China.
School of Life Sciences, Tsinghua University, Beijing 100084, China.
Bioinformatics. 2021 Nov 5;37(21):3964-3965. doi: 10.1093/bioinformatics/btab420.
Clustering is a key step in revealing heterogeneities in single-cell data. Most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical clustering (HC) provides dendrograms of cells, but cannot scale to large datasets due to high computational complexity. We present HGC, a fast Hierarchical Graph-based Clustering tool to address both problems. It combines the advantages of graph-based clustering and HC. On the shared nearest-neighbor graph of cells, HGC constructs the hierarchical tree with linear time complexity. Experiments showed that HGC enables multiresolution exploration of the biological hierarchy underlying the data, achieves state-of-the-art accuracy on benchmark data and can scale to large datasets.
The R package of HGC is available at https://bioconductor.org/packages/HGC/.
Supplementary data are available at Bioinformatics online.
聚类是揭示单细胞数据异质性的关键步骤。大多数现有的单细胞聚类方法输出固定数量的聚类,而没有层次信息。经典的层次聚类 (HC) 提供了细胞的层次图,但由于计算复杂度高,无法扩展到大型数据集。我们提出了 HGC,这是一种快速基于图的层次聚类工具,可以解决这两个问题。它结合了基于图的聚类和 HC 的优点。在细胞的共享最近邻图上,HGC 以线性时间复杂度构建层次树。实验表明,HGC 能够对数据底层的生物学层次进行多分辨率探索,在基准数据上达到了最先进的准确性,并且可以扩展到大型数据集。
HGC 的 R 包可在 https://bioconductor.org/packages/HGC/ 获得。
补充数据可在 Bioinformatics 在线获得。