HGC：适用于大规模单细胞数据的快速层次聚类。

HGC: fast hierarchical clustering for large-scale single-cell data.

机构信息

MOE Key Laboratory of Bioinformatics, Division of Bioinformatics, BNRIST and Department of Automation, Tsinghua University, Beijing 100084, China.

School of Life Sciences, Tsinghua University, Beijing 100084, China.

出版信息

Bioinformatics. 2021 Nov 5;37(21):3964-3965. doi: 10.1093/bioinformatics/btab420.

DOI:10.1093/bioinformatics/btab420

PMID:34096998

Abstract

SUMMARY

Clustering is a key step in revealing heterogeneities in single-cell data. Most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical clustering (HC) provides dendrograms of cells, but cannot scale to large datasets due to high computational complexity. We present HGC, a fast Hierarchical Graph-based Clustering tool to address both problems. It combines the advantages of graph-based clustering and HC. On the shared nearest-neighbor graph of cells, HGC constructs the hierarchical tree with linear time complexity. Experiments showed that HGC enables multiresolution exploration of the biological hierarchy underlying the data, achieves state-of-the-art accuracy on benchmark data and can scale to large datasets.

AVAILABILITY AND IMPLEMENTATION

The R package of HGC is available at https://bioconductor.org/packages/HGC/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

聚类是揭示单细胞数据异质性的关键步骤。大多数现有的单细胞聚类方法输出固定数量的聚类，而没有层次信息。经典的层次聚类 (HC) 提供了细胞的层次图，但由于计算复杂度高，无法扩展到大型数据集。我们提出了 HGC，这是一种快速基于图的层次聚类工具，可以解决这两个问题。它结合了基于图的聚类和 HC 的优点。在细胞的共享最近邻图上，HGC 以线性时间复杂度构建层次树。实验表明，HGC 能够对数据底层的生物学层次进行多分辨率探索，在基准数据上达到了最先进的准确性，并且可以扩展到大型数据集。

可用性和实现

HGC 的 R 包可在 https://bioconductor.org/packages/HGC/ 获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

HGC: fast hierarchical clustering for large-scale single-cell data.

Bioinformatics. 2021 Nov 5;37(21):3964-3965. doi: 10.1093/bioinformatics/btab420.

densityCut: an efficient and versatile topological approach for automatic clustering of biological data.

Bioinformatics. 2016 Sep 1;32(17):2567-76. doi: 10.1093/bioinformatics/btw227. Epub 2016 Apr 23.

GMHCC: high-throughput analysis of biomolecular data using graph-based multiple hierarchical consensus clustering.

Bioinformatics. 2022 May 26;38(11):3020-3028. doi: 10.1093/bioinformatics/btac290.

clustComp, a bioconductor package for the comparison of clustering results.

Bioinformatics. 2017 Dec 15;33(24):4001-4003. doi: 10.1093/bioinformatics/btx532.

SCHNEL: scalable clustering of high dimensional single-cell data.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i849-i856. doi: 10.1093/bioinformatics/btaa816.

TreeAndLeaf: an R/Bioconductor package for graphs and trees with focus on the leaves.

Bioinformatics. 2022 Feb 7;38(5):1463-1464. doi: 10.1093/bioinformatics/btab819.

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning.

Interdiscip Sci. 2020 Jun;12(2):117-130. doi: 10.1007/s12539-019-00357-4. Epub 2020 Feb 22.

FlowGrid enables fast clustering of very large single-cell RNA-seq data.

Bioinformatics. 2021 Dec 22;38(1):282-283. doi: 10.1093/bioinformatics/btab521.

PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells.

Bioinformatics. 2020 May 1;36(9):2778-2786. doi: 10.1093/bioinformatics/btaa042.

引用本文的文献

Avian influenza virus dynamics in poultry and the environment: an eight-year longitudinal study in the southwestern Poyang Lake region of China.

Infect Dis Model. 2025 Jun 11;10(4):1126-1137. doi: 10.1016/j.idm.2025.06.002. eCollection 2025 Dec.

scCCTR: An iterative selection-based semi-supervised clustering model for single-cell RNA-seq data.

Comput Struct Biotechnol J. 2025 Mar 14;27:1090-1102. doi: 10.1016/j.csbj.2025.03.018. eCollection 2025.

Single-cell omics: experimental workflow, data analyses and applications.

Sci China Life Sci. 2025 Jan;68(1):5-102. doi: 10.1007/s11427-023-2561-0. Epub 2024 Jul 23.

Altered resting-state functional connectivity and dynamic network properties in cognitive impairment: an independent component and dominant-coactivation pattern analyses study.

Front Aging Neurosci. 2024 Mar 18;16:1362613. doi: 10.3389/fnagi.2024.1362613. eCollection 2024.

Integrated 4D label-free proteomics and data mining to elucidate the effects of thermal processing on crisp grass carp protein profiles.

Curr Res Food Sci. 2024 Jan 19;8:100681. doi: 10.1016/j.crfs.2024.100681. eCollection 2024.

JOINTLY: interpretable joint clustering of single-cell transcriptomes.

Nat Commun. 2023 Dec 20;14(1):8473. doi: 10.1038/s41467-023-44279-8.

Robust joint clustering of multi-omics single-cell data via multi-modal high-order neighborhood Laplacian matrix optimization.

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad414.

Cellular features of localized microenvironments in human meniscal degeneration: a single-cell transcriptomic study.

Elife. 2022 Dec 22;11:e79585. doi: 10.7554/eLife.79585.

The embryonic zebrafish brain is seeded by a lymphatic-dependent population of mrc1 microglia precursors.

Nat Neurosci. 2022 Jul;25(7):849-864. doi: 10.1038/s41593-022-01091-9. Epub 2022 Jun 16.

Dissecting Cellular Heterogeneity Based on Network Denoising of scRNA-seq Using Local Scaling Self-Diffusion.

Front Genet. 2022 Jan 10;12:811043. doi: 10.3389/fgene.2021.811043. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HGC：适用于大规模单细胞数据的快速层次聚类。

HGC: fast hierarchical clustering for large-scale single-cell data.

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

摘要

可用性和实现

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献