使用基于子采样的稳健性指标选择单细胞聚类参数值。

Selecting single cell clustering parameter values using subsampling-based robustness metrics.

作者信息

Patterson-Cross Ryan B, Levine Ariel J, Menon Vilas

机构信息

Spinal Circuits and Plasticity Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.

Department of Neurology, Center for Translational and Computational Neuroimmunology, Columbia University, New York City, NY, USA.

出版信息

BMC Bioinformatics. 2021 Feb 1;22(1):39. doi: 10.1186/s12859-021-03957-4.

DOI:10.1186/s12859-021-03957-4

PMID:33522897

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7852188/

Abstract

BACKGROUND

Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems.

RESULTS

Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple "robustness score" for each of these clusters, facilitating the assessment of cluster quality.

CONCLUSION

chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.

摘要

背景

生成和分析单细胞数据已成为一种广泛应用的方法，用于研究组织异质性，并且存在许多算法可对这些数据集进行聚类，以识别具有共享转录组特征的假定细胞类型。然而，许多这些聚类工作流程依赖于针对每个数据集进行用户调整的参数值，以识别一组生物学相关的聚类。虽然用户通常会针对每个数据集的聚类参数最佳范围形成自己的直觉，但缺乏识别此范围的系统方法可能会让任何给定工作流程的新用户望而却步。此外，鉴于大多数生物系统中转录组特征的异质性，最优参数集并不能保证所有聚类都能得到同样好的解析。

结果

在这里，我们展示了一种基于子采样的方法（chooseR），该方法可同时指导参数选择并表征聚类稳健性。通过在一系列参数上进行自举迭代聚类，chooseR被用于为两种不同的聚类工作流程（Seurat和scVI）选择参数值。在每种情况下，chooseR都能从特征明确的（人类外周血单核细胞）和复杂的（小鼠脊髓）数据集中识别出产生生物学相关聚类的参数。此外，它为每个聚类提供了一个简单的“稳健性分数”，便于评估聚类质量。

结论

chooseR是一个简单、概念上易于理解的工具，可灵活应用于各种聚类算法、工作流程和数据集，以指导聚类参数选择并表征聚类稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b96/7852188/66f9d6885f75/12859_2021_3957_Fig1_HTML.jpg

相似文献

Selecting single cell clustering parameter values using subsampling-based robustness metrics.使用基于子采样的稳健性指标选择单细胞聚类参数值。

BMC Bioinformatics. 2021 Feb 1;22(1):39. doi: 10.1186/s12859-021-03957-4.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

A robustness metric for biological data clustering algorithms.生物数据聚类算法的稳健性度量。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 15):503. doi: 10.1186/s12859-019-3089-6.

Improving replicability in single-cell RNA-Seq cell type discovery with Dune.利用 Dune 提高单细胞 RNA-Seq 细胞类型发现的可重复性。

BMC Bioinformatics. 2024 May 24;25(1):198. doi: 10.1186/s12859-024-05814-6.

Characterization of gene cluster heterogeneity in single-cell transcriptomic data within and across cancer types.单细胞转录组数据中肿瘤内和肿瘤间基因簇异质性的特征分析。

Biol Open. 2022 Jun 15;11(6). doi: 10.1242/bio.059256. Epub 2022 Jun 23.

Evaluating single-cell cluster stability using the Jaccard similarity index.使用 Jaccard 相似性指数评估单细胞聚类稳定性。

Bioinformatics. 2021 Aug 9;37(15):2212-2214. doi: 10.1093/bioinformatics/btaa956.

clusterExperiment and RSEC: A Bioconductor package and framework for clustering of single-cell and other large gene expression datasets.clusterExperiment 和 RSEC：一个用于单细胞和其他大型基因表达数据集聚类的 Bioconductor 包和框架。

PLoS Comput Biol. 2018 Sep 4;14(9):e1006378. doi: 10.1371/journal.pcbi.1006378. eCollection 2018 Sep.

Clustering trees: a visualization for evaluating clusterings at multiple resolutions.聚类树：一种用于在多个分辨率下评估聚类的可视化方法。

Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy083.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.使用功能类别参考集评估基因表达数据聚类算法的方法。

BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

Effect of data normalization on fuzzy clustering of DNA microarray data.数据归一化对DNA微阵列数据模糊聚类的影响。

BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.

引用本文的文献

GUIDING CLUSTERING AND ANNOTATION IN SINGLE-CELL RNA SEQUENCING USING THE AVERAGE OVERLAP METRIC.使用平均重叠度量指导单细胞RNA测序中的聚类和注释

bioRxiv. 2025 May 10:2025.05.06.652497. doi: 10.1101/2025.05.06.652497.

scICE: enhancing clustering reliability and efficiency of scRNA-seq data with multi-cluster label consistency evaluation.scICE：通过多聚类标签一致性评估提高scRNA-seq数据的聚类可靠性和效率。

Nat Commun. 2025 Jul 2;16(1):6031. doi: 10.1038/s41467-025-60702-8.

Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics.使用内在优度指标优化单细胞RNA分析的聚类参数

Front Bioinform. 2025 Jun 11;5:1562410. doi: 10.3389/fbinf.2025.1562410. eCollection 2025.

Heterogeneous pericoerulear neurons tune arousal and exploratory behaviours.异质性蓝斑周围神经元调节觉醒和探索行为。

Nature. 2025 May 7. doi: 10.1038/s41586-025-08952-w.

Crosstalk Signaling Between the Epithelial and Non-Epithelial Compartments of the Mouse Inner Ear.小鼠内耳上皮和非上皮区室之间的串扰信号传导

J Assoc Res Otolaryngol. 2025 Apr;26(2):127-145. doi: 10.1007/s10162-025-00980-7. Epub 2025 Mar 13.

Addressing persistent challenges in digital image analysis of cancer tissue: resources developed from a hackathon.应对癌症组织数字图像分析中的持续挑战：源自黑客马拉松的资源

Mol Oncol. 2025 Jun;19(6):1565-1581. doi: 10.1002/1878-0261.13783. Epub 2025 Feb 10.

Exploring the utility of snRNA-seq in profiling human bladder tissue: A comprehensive comparison with scRNA-seq.探索单细胞核RNA测序在人膀胱组织分析中的效用：与单细胞RNA测序的全面比较。

iScience. 2024 Dec 18;28(1):111628. doi: 10.1016/j.isci.2024.111628. eCollection 2025 Jan 17.

Single Cell Transcriptomic Profiling of -Associated Hypertrophic Cardiomyopathy Across Species Reveals Conservation of Biological Process But Not Gene Expression.跨物种与肥厚型心肌病相关的单细胞转录组分析揭示了生物学过程的保守性而非基因表达的保守性。

J Am Heart Assoc. 2025 Jan 7;14(1):e035780. doi: 10.1161/JAHA.124.035780. Epub 2024 Dec 24.

Integration of bulk and single-cell RNA-seq reveals prognostic gene signatures in patients with bladder cancer treated with immune checkpoint inhibitors.整合批量和单细胞RNA测序揭示接受免疫检查点抑制剂治疗的膀胱癌患者的预后基因特征。

Cancer Immunol Immunother. 2024 Dec 21;74(1):28. doi: 10.1007/s00262-024-03839-7.

Enhancing spatial domain detection in spatial transcriptomics with EnSDD.利用 EnSDD 增强空间转录组学中的空间域检测。

Commun Biol. 2024 Oct 21;7(1):1358. doi: 10.1038/s42003-024-07001-y.

本文引用的文献

Putative cell type discovery from single-cell gene expression data.基于单细胞基因表达数据的假定细胞类型发现。

Nat Methods. 2020 Jun;17(6):621-628. doi: 10.1038/s41592-020-0825-9. Epub 2020 May 18.

Systematic comparison of single-cell and single-nucleus RNA-sequencing methods.单细胞和单细胞核 RNA 测序方法的系统比较。

Nat Biotechnol. 2020 Jun;38(6):737-746. doi: 10.1038/s41587-020-0465-8. Epub 2020 Apr 6.

A robustness metric for biological data clustering algorithms.生物数据聚类算法的稳健性度量。

BMC Bioinformatics. 2019 Dec 24;20(Suppl 15):503. doi: 10.1186/s12859-019-3089-6.

Comprehensive Integration of Single-Cell Data.单细胞数据的综合整合。

Cell. 2019 Jun 13;177(7):1888-1902.e21. doi: 10.1016/j.cell.2019.05.031. Epub 2019 Jun 6.

From Louvain to Leiden: guaranteeing well-connected communities.从鲁汶到莱顿：保障互联互通的社区。

Sci Rep. 2019 Mar 26;9(1):5233. doi: 10.1038/s41598-019-41695-z.

Deep generative modeling for single-cell transcriptomics.单细胞转录组学的深度生成模型。

Nat Methods. 2018 Dec;15(12):1053-1058. doi: 10.1038/s41592-018-0229-2. Epub 2018 Nov 30.

PLoS Comput Biol. 2018 Sep 4;14(9):e1006378. doi: 10.1371/journal.pcbi.1006378. eCollection 2018 Sep.

A cluster robustness score for identifying cell subpopulations in single cell gene expression datasets from heterogeneous tissues and tumors.一种用于鉴定异质组织和肿瘤单细胞基因表达数据集中细胞亚群的聚类稳健性评分。

Bioinformatics. 2019 Mar 15;35(6):962-971. doi: 10.1093/bioinformatics/bty708.

Integrating single-cell transcriptomic data across different conditions, technologies, and species.整合不同条件、技术和物种的单细胞转录组数据。

Nat Biotechnol. 2018 Jun;36(5):411-420. doi: 10.1038/nbt.4096. Epub 2018 Apr 2.

Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding.单细胞分析发育中的鼠脑和脊髓的分裂池条形码技术。

Science. 2018 Apr 13;360(6385):176-182. doi: 10.1126/science.aam8999. Epub 2018 Mar 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用基于子采样的稳健性指标选择单细胞聚类参数值。

Selecting single cell clustering parameter values using subsampling-based robustness metrics.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献