Suppr超能文献

使用 Jaccard 相似性指数评估单细胞聚类稳定性。

Evaluating single-cell cluster stability using the Jaccard similarity index.

机构信息

FAS Informatics Group, Harvard University, Cambridge, MA, USA.

Department of Molecular and Cellular Biology, Center for Brain Science, Harvard University, Cambridge, MA, USA.

出版信息

Bioinformatics. 2021 Aug 9;37(15):2212-2214. doi: 10.1093/bioinformatics/btaa956.

Abstract

MOTIVATION

One major goal of single-cell RNA sequencing (scRNAseq) experiments is to identify novel cell types. With increasingly large scRNAseq datasets, unsupervised clustering methods can now produce detailed catalogues of transcriptionally distinct groups of cells in a sample. However, the interpretation of these clusters is challenging for both technical and biological reasons. Popular clustering algorithms are sensitive to parameter choices, and can produce different clustering solutions with even small changes in the number of principal components used, the k nearest neighbor and the resolution parameters, among others.

RESULTS

Here, we present a set of tools to evaluate cluster stability by subsampling, which can guide parameter choice and aid in biological interpretation. The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat and estimation of cluster stability using the Jaccard similarity index and providing rich visualizations.

AVAILABILITYAND IMPLEMENTATION

R package scclusteval: https://github.com/crazyhottommy/scclusteval Snakemake workflow: https://github.com/crazyhottommy/pyflow_seuratv3_parameter Tutorial: https://crazyhottommy.github.io/EvaluateSingleCellClustering/.

摘要

动机

单细胞 RNA 测序(scRNAseq)实验的主要目标之一是鉴定新的细胞类型。随着越来越大的 scRNAseq 数据集,无监督聚类方法现在可以在样本中产生转录上不同的细胞群的详细目录。然而,由于技术和生物学原因,这些聚类的解释具有挑战性。流行的聚类算法对参数选择敏感,即使在用于主成分数量、最近邻 k 和分辨率参数等的微小变化下,也可以产生不同的聚类解决方案。

结果

在这里,我们提出了一套通过抽样评估聚类稳定性的工具,这可以指导参数选择并有助于生物学解释。R 包 scclusteval 和随附的 Snakemake 工作流程实现了该管道的所有步骤:对细胞进行抽样、使用 Seurat 重复聚类以及使用 Jaccard 相似性指数估计聚类稳定性,并提供丰富的可视化效果。

可用性和实现

R 包 scclusteval:https://github.com/crazyhottommy/scclusteval Snakemake 工作流程:https://github.com/crazyhottommy/pyflow_seuratv3_parameter 教程:https://crazyhottommy.github.io/EvaluateSingleCellClustering/。

相似文献

1
Evaluating single-cell cluster stability using the Jaccard similarity index.
Bioinformatics. 2021 Aug 9;37(15):2212-2214. doi: 10.1093/bioinformatics/btaa956.
2
SCHNEL: scalable clustering of high dimensional single-cell data.
Bioinformatics. 2020 Dec 30;36(Suppl_2):i849-i856. doi: 10.1093/bioinformatics/btaa816.
3
4
netAE: semi-supervised dimensionality reduction of single-cell RNA sequencing to facilitate cell labeling.
Bioinformatics. 2021 Apr 9;37(1):43-49. doi: 10.1093/bioinformatics/btaa669.
5
scClustViz - Single-cell RNAseq cluster assessment and visualization.
F1000Res. 2018 Sep 21;7. doi: 10.12688/f1000research.16198.2. eCollection 2018.
6
Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning.
Bioinformatics. 2019 Aug 15;35(16):2809-2817. doi: 10.1093/bioinformatics/bty1056.
9
Spectral clustering of single-cell multi-omics data on multilayer graphs.
Bioinformatics. 2022 Jul 11;38(14):3600-3608. doi: 10.1093/bioinformatics/btac378.
10
SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data.
Genomics Proteomics Bioinformatics. 2019 Apr;17(2):201-210. doi: 10.1016/j.gpb.2018.10.003. Epub 2019 Jun 13.

引用本文的文献

2
Comparative single-cell lineage tracing identifies distinct adipocyte precursor dynamics in skin and inguinal fat.
Cell Stem Cell. 2025 Aug 7;32(8):1267-1284.e8. doi: 10.1016/j.stem.2025.07.004. Epub 2025 Jul 30.
6
Developmental molecular signatures define cortico-brainstem circuit for skilled forelimb movement.
Res Sq. 2025 Mar 26:rs.3.rs-6150344. doi: 10.21203/rs.3.rs-6150344/v1.
7
CHOIR improves significance-based detection of cell types and states from single-cell data.
Nat Genet. 2025 May;57(5):1309-1319. doi: 10.1038/s41588-025-02148-8. Epub 2025 Apr 7.
9
Transcriptomic neuron types vary topographically in function and morphology.
Nature. 2025 Feb;638(8052):1023-1033. doi: 10.1038/s41586-024-08518-2. Epub 2025 Feb 12.
10
Deciphering the dynamic single-cell transcriptional landscape in the ocular surface ectoderm differentiation system.
Life Med. 2024 Sep 5;3(5):lnae033. doi: 10.1093/lifemedi/lnae033. eCollection 2024 Oct.

本文引用的文献

1
Creating and sharing reproducible research code the workflowr way.
F1000Res. 2019 Oct 14;8:1749. doi: 10.12688/f1000research.20843.1. eCollection 2019.
2
A periodic table of cell types.
Development. 2019 Jun 27;146(12):dev169854. doi: 10.1242/dev.169854.
3
The evolving concept of cell identity in the single cell era.
Development. 2019 Jun 27;146(12):dev169748. doi: 10.1242/dev.169748.
4
Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments.
Nat Methods. 2019 Jun;16(6):479-487. doi: 10.1038/s41592-019-0425-8. Epub 2019 May 27.
5
Raincloud plots: a multi-platform tool for robust data visualization.
Wellcome Open Res. 2021 Jan 21;4:63. doi: 10.12688/wellcomeopenres.15191.2. eCollection 2019.
6
A systematic performance evaluation of clustering methods for single-cell RNA-seq data.
F1000Res. 2018 Jul 26;7:1141. doi: 10.12688/f1000research.15666.3. eCollection 2018.
7
Clustering trees: a visualization for evaluating clusterings at multiple resolutions.
Gigascience. 2018 Jul 1;7(7). doi: 10.1093/gigascience/giy083.
8
SCANPY: large-scale single-cell gene expression data analysis.
Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0.
9
SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis.
PLoS Comput Biol. 2015 Nov 24;11(11):e1004575. doi: 10.1371/journal.pcbi.1004575. eCollection 2015 Nov.
10
Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.
Cell. 2015 May 21;161(5):1202-1214. doi: 10.1016/j.cell.2015.05.002.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验