基于单细胞 RNA-seq 数据的聚类意义分析。

Significance analysis for clustering with single-cell RNA-sequencing data.

机构信息

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.

Division of Biostatistics, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.

出版信息

Nat Methods. 2023 Aug;20(8):1196-1202. doi: 10.1038/s41592-023-01933-9. Epub 2023 Jul 10.

DOI:10.1038/s41592-023-01933-9

PMID:37429993

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11282907/

Abstract

Unsupervised clustering of single-cell RNA-sequencing data enables the identification of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. We find that not addressing known sources of variability in a statistically rigorous manner can lead to overconfidence in the discovery of novel cell types. Here we extend a previous method, significance of hierarchical clustering, to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. Finally, we extend these approaches to account for batch structure. We benchmarked our approach against popular clustering workflows, demonstrating improved performance. To show practical utility, we applied our approach to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex, identifying several cases of over-clustering and recapitulating experimentally validated cell type definitions.

摘要

无监督的单细胞 RNA 测序数据分析能够鉴定不同的细胞群体。然而，最广泛使用的聚类算法是启发式的，并没有正式考虑统计不确定性。我们发现，如果不以严格的统计学方法来处理已知的变异来源，可能会导致对新细胞类型的发现过于自信。在这里，我们扩展了先前的方法——层次聚类的显著性，提出了一种基于模型的假设检验方法，该方法将显著性分析纳入聚类算法，并允许对聚类进行统计学评估，将其视为不同的细胞群体。我们还对该方法进行了调整，以允许对任何算法报告的聚类进行统计评估。最后，我们扩展了这些方法以解释批次结构。我们将我们的方法与流行的聚类工作流程进行了基准测试，证明了其性能的提升。为了展示实际应用，我们将我们的方法应用于人类肺细胞图谱和小鼠小脑皮质图谱，鉴定了几种过度聚类的情况，并再现了经过实验验证的细胞类型定义。

相似文献

Significance analysis for clustering with single-cell RNA-sequencing data.基于单细胞 RNA-seq 数据的聚类意义分析。

Nat Methods. 2023 Aug;20(8):1196-1202. doi: 10.1038/s41592-023-01933-9. Epub 2023 Jul 10.

Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters.流式数字细胞分选仪（p-DCS）：从单细胞 RNA 测序簇中自动识别血细胞类型。

BMC Bioinformatics. 2019 Jul 1;20(1):369. doi: 10.1186/s12859-019-2951-x.

A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data.基于单细胞 RNA-seq 数据的混合深度聚类方法进行稳健的细胞类型分析。

RNA. 2020 Oct;26(10):1303-1319. doi: 10.1261/rna.074427.119. Epub 2020 Jun 12.

Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data.基于单细胞 RNA 测序数据评估细胞类型数量的聚类算法基准测试。

Genome Biol. 2022 Feb 8;23(1):49. doi: 10.1186/s13059-022-02622-0.

Review of single-cell RNA-seq data clustering for cell-type identification and characterization.单细胞 RNA-seq 数据聚类用于细胞类型鉴定和特征分析的综述。

RNA. 2023 May;29(5):517-530. doi: 10.1261/rna.078965.121. Epub 2023 Feb 3.

CellBIC: bimodality-based top-down clustering of single-cell RNA sequencing data reveals hierarchical structure of the cell type.CellBIC：基于双峰性的单细胞 RNA 测序数据自上而下聚类揭示了细胞类型的层次结构。

Nucleic Acids Res. 2018 Nov 30;46(21):e124. doi: 10.1093/nar/gky698.

Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations.基于单细胞 RNA-seq 数据的自监督深度聚类来分层检测稀有细胞群体。

Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad335.

Single-cell data clustering based on sparse optimization and low-rank matrix factorization.基于稀疏优化和低秩矩阵分解的单细胞数据聚类。

G3 (Bethesda). 2021 Jun 17;11(6). doi: 10.1093/g3journal/jkab098.

SCMcluster: a high-precision cell clustering algorithm integrating marker gene set with single-cell RNA sequencing data.SCMcluster：一种高精度的细胞聚类算法，整合了标记基因集与单细胞 RNA 测序数据。

Brief Funct Genomics. 2023 Jul 17;22(4):329-340. doi: 10.1093/bfgp/elad004.

Accurate feature selection improves single-cell RNA-seq cell clustering.准确的特征选择可提高单细胞 RNA-seq 细胞聚类。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab034.

引用本文的文献

Comparative benchmarking of single-cell clustering algorithms for transcriptomic and proteomic data.用于转录组学和蛋白质组学数据的单细胞聚类算法的比较基准测试

Genome Biol. 2025 Sep 3;26(1):265. doi: 10.1186/s13059-025-03719-y.

Model-based dimensionality reduction for single-cell RNA-seq using generalized bilinear models.使用广义双线性模型对单细胞RNA测序进行基于模型的降维

Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxaf024.

Analysis of individual patient pathway coordination in a cross-species single-cell kidney atlas.跨物种单细胞肾脏图谱中个体患者路径协调分析

Nat Genet. 2025 Aug 7. doi: 10.1038/s41588-025-02285-0.

Design specifications for biomedical virtual twins in engineered adoptive cellular immunotherapies.工程化过继性细胞免疫疗法中生物医学虚拟孪生体的设计规范。

NPJ Digit Med. 2025 Aug 1;8(1):493. doi: 10.1038/s41746-025-01809-6.

Melanocytes and photosensory organs share a common ancestry that illuminates the origins of the neural crest.黑素细胞和光感受器器官有着共同的起源，这为神经嵴的起源提供了线索。

Commun Biol. 2025 Jul 23;8(1):1092. doi: 10.1038/s42003-025-08502-0.

scRECL: representative ensembles with contrastive learning for scRNA-seq data clustering analysis.scRECL：用于scRNA序列数据聚类分析的具有对比学习的代表性集成方法

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf346.

scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation.scICE：通过多聚类标签一致性评估提高scRNA-seq数据的聚类可靠性和效率。

Nat Commun. 2025 Jul 2;16(1):6031. doi: 10.1038/s41467-025-60702-8.

Spatial transcriptomics reveals human cortical layer and area specification.空间转录组学揭示了人类皮质层和区域特征。

Nature. 2025 May 14. doi: 10.1038/s41586-025-09010-1.

Deconvolution of cell types and states in spatial multiomics utilizing TACIT.利用TACIT对空间多组学中的细胞类型和状态进行反卷积分析。

Nat Commun. 2025 Apr 21;16(1):3747. doi: 10.1038/s41467-025-58874-4.

CHOIR improves significance-based detection of cell types and states from single-cell data.CHOIR改进了基于显著性的单细胞数据中细胞类型和状态的检测。

Nat Genet. 2025 May;57(5):1309-1319. doi: 10.1038/s41588-025-02148-8. Epub 2025 Apr 7.

本文引用的文献

A probabilistic gene expression barcode for annotation of cell types from single-cell RNA-seq data.一种基于概率的基因表达条码，用于注释单细胞 RNA-seq 数据中的细胞类型。

Biostatistics. 2022 Oct 14;23(4):1150-1164. doi: 10.1093/biostatistics/kxac021.

A transcriptomic atlas of mouse cerebellar cortex comprehensively defines cell types.小鼠小脑皮质转录组图谱全面定义细胞类型。

Nature. 2021 Oct;598(7879):214-219. doi: 10.1038/s41586-021-03220-z. Epub 2021 Oct 6.

Integrated analysis of multimodal single-cell data.多模态单细胞数据的综合分析。

Cell. 2021 Jun 24;184(13):3573-3587.e29. doi: 10.1016/j.cell.2021.04.048. Epub 2021 May 31.

Selecting single cell clustering parameter values using subsampling-based robustness metrics.使用基于子采样的稳健性指标选择单细胞聚类参数值。

BMC Bioinformatics. 2021 Feb 1;22(1):39. doi: 10.1186/s12859-021-03957-4.

A molecular cell atlas of the human lung from single-cell RNA sequencing.人类肺部单细胞 RNA 测序的分子细胞图谱。

Nature. 2020 Nov;587(7835):619-625. doi: 10.1038/s41586-020-2922-4. Epub 2020 Nov 18.

Evaluating single-cell cluster stability using the Jaccard similarity index.使用 Jaccard 相似性指数评估单细胞聚类稳定性。

Bioinformatics. 2021 Aug 9;37(15):2212-2214. doi: 10.1093/bioinformatics/btaa956.

Identification of cell types from single cell data using stable clustering.基于稳定聚类的单细胞数据中的细胞类型鉴定。

Sci Rep. 2020 Jul 23;10(1):12349. doi: 10.1038/s41598-020-66848-3.

Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model.基于多项模型的单细胞 RNA-Seq 特征选择和降维。

Genome Biol. 2019 Dec 23;20(1):295. doi: 10.1186/s13059-019-1861-6.

Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq.单细胞 RNA-Seq 的有效聚类后差异分析。

Cell Syst. 2019 Oct 23;9(4):383-392.e6. doi: 10.1016/j.cels.2019.07.012. Epub 2019 Sep 11.

Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity.单细胞多组学整合比较和对比脑细胞特征。

Cell. 2019 Jun 13;177(7):1873-1887.e17. doi: 10.1016/j.cell.2019.05.006. Epub 2019 Jun 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验