• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SC3s:单细胞共识聚类到数百万个细胞的高效扩展。

SC3s: efficient scaling of single cell consensus clustering to millions of cells.

机构信息

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK.

The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.

出版信息

BMC Bioinformatics. 2022 Dec 12;23(1):536. doi: 10.1186/s12859-022-05085-z.

DOI:10.1186/s12859-022-05085-z
PMID:36503522
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9743492/
Abstract

BACKGROUND

Today it is possible to profile the transcriptome of individual cells, and a key step in the analysis of these datasets is unsupervised clustering. For very large datasets, efficient algorithms are required to ensure that analyses can be conducted with reasonable time and memory requirements.

RESULTS

Here, we present a highly efficient k-means based approach, and we demonstrate that it scales favorably with the number of cells with regards to time and memory.

CONCLUSIONS

We have demonstrated that our streaming k-means clustering algorithm gives state-of-the-art performance while resource requirements scale favorably for up to 2 million cells.

摘要

背景

如今,人们可以对单个细胞的转录组进行分析,而分析这些数据集的关键步骤是无监督聚类。对于非常大的数据集,需要使用高效的算法来确保分析可以在合理的时间和内存要求下进行。

结果

在这里,我们提出了一种基于高效 k-均值的方法,并证明它在时间和内存方面都能很好地扩展到细胞数量。

结论

我们已经证明,我们的流式 k-均值聚类算法在资源需求方面表现出色,最多可扩展到 200 万个细胞。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ca/9743492/15818f6ea117/12859_2022_5085_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ca/9743492/9bb5a303418f/12859_2022_5085_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ca/9743492/15818f6ea117/12859_2022_5085_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ca/9743492/9bb5a303418f/12859_2022_5085_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3ca/9743492/15818f6ea117/12859_2022_5085_Fig2_HTML.jpg

相似文献

1
SC3s: efficient scaling of single cell consensus clustering to millions of cells.SC3s:单细胞共识聚类到数百万个细胞的高效扩展。
BMC Bioinformatics. 2022 Dec 12;23(1):536. doi: 10.1186/s12859-022-05085-z.
2
A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets.一种改进的超平面聚类算法能够对超大型数据集进行高效且准确的聚类。
Bioinformatics. 2009 May 1;25(9):1152-7. doi: 10.1093/bioinformatics/btp123. Epub 2009 Mar 4.
3
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
4
Does Determination of Initial Cluster Centroids Improve the Performance of -Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in an Applied Study.初始聚类质心的确定是否能提高 -Means 聚类算法的性能?在应用研究中,通过遗传算法、最小生成树和层次聚类三种混合方法的比较。
Comput Math Methods Med. 2020 Aug 1;2020:7636857. doi: 10.1155/2020/7636857. eCollection 2020.
5
Recursive Consensus Clustering for novel subtype discovery from transcriptome data.基于转录组数据的新型亚型发现的递归共识聚类。
Sci Rep. 2020 Jul 3;10(1):11005. doi: 10.1038/s41598-020-67016-3.
6
Boosting k-means clustering with symbiotic organisms search for automatic clustering problems.利用共生生物搜索算法增强 k-均值聚类算法以解决自动聚类问题。
PLoS One. 2022 Aug 11;17(8):e0272861. doi: 10.1371/journal.pone.0272861. eCollection 2022.
7
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
8
Efficient streaming text clustering.高效的流式文本聚类。
Neural Netw. 2005 Jun-Jul;18(5-6):790-8. doi: 10.1016/j.neunet.2005.06.008.
9
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.通过自适应约束K均值算法进行无监督冷冻电镜数据聚类
PLoS One. 2016 Dec 13;11(12):e0167765. doi: 10.1371/journal.pone.0167765. eCollection 2016.
10
SEED: efficient clustering of next-generation sequences.SEED:下一代序列的高效聚类。
Bioinformatics. 2011 Sep 15;27(18):2502-9. doi: 10.1093/bioinformatics/btr447. Epub 2011 Aug 2.

引用本文的文献

1
Optimization of clustering parameters for single-cell RNA analysis using intrinsic goodness metrics.使用内在优度指标优化单细胞RNA分析的聚类参数
Front Bioinform. 2025 Jun 11;5:1562410. doi: 10.3389/fbinf.2025.1562410. eCollection 2025.
2
scMINER: a mutual information-based framework for clustering and hidden driver inference from single-cell transcriptomics data.scMINER:一种基于互信息的框架,用于从单细胞转录组数据中进行聚类和隐藏驱动因素推断。
Nat Commun. 2025 May 8;16(1):4305. doi: 10.1038/s41467-025-59620-6.
3
CHOIR improves significance-based detection of cell types and states from single-cell data.

本文引用的文献

1
Modular, efficient and constant-memory single-cell RNA-seq preprocessing.模块化、高效且内存恒定的单细胞RNA测序预处理
Nat Biotechnol. 2021 Jul;39(7):813-818. doi: 10.1038/s41587-021-00870-2. Epub 2021 Apr 1.
2
Putative cell type discovery from single-cell gene expression data.基于单细胞基因表达数据的假定细胞类型发现。
Nat Methods. 2020 Jun;17(6):621-628. doi: 10.1038/s41592-020-0825-9. Epub 2020 May 18.
3
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
CHOIR改进了基于显著性的单细胞数据中细胞类型和状态的检测。
Nat Genet. 2025 May;57(5):1309-1319. doi: 10.1038/s41588-025-02148-8. Epub 2025 Apr 7.
4
A unified analysis of atlas single-cell data.图谱单细胞数据的统一分析
Genome Res. 2025 May 2;35(5):1219-1233. doi: 10.1101/gr.279631.124.
5
ESCHR: a hyperparameter-randomized ensemble approach for robust clustering across diverse datasets.ESCHR:一种针对不同数据集的稳健聚类的超参数随机集成方法。
Genome Biol. 2024 Sep 16;25(1):242. doi: 10.1186/s13059-024-03386-5.
6
CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis.CTEC:一种用于单细胞 RNA 测序数据分析的交叉制表集成聚类方法。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae130.
7
Anti-correlated feature selection prevents false discovery of subpopulations in scRNAseq.抗相关特征选择可防止 scRNAseq 中亚群的假发现。
Nat Commun. 2024 Jan 24;15(1):699. doi: 10.1038/s41467-023-43406-9.
8
MENDER: fast and scalable tissue structure identification in spatial omics data.MENDER:空间组学数据中快速且可扩展的组织结构识别。
Nat Commun. 2024 Jan 5;15(1):207. doi: 10.1038/s41467-023-44367-9.
9
CAKE: a flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification.CAKE:一种灵活的自监督框架,用于增强细胞可视化、聚类和稀有细胞鉴定。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad475.
10
A cofunctional grouping-based approach for non-redundant feature gene selection in unannotated single-cell RNA-seq analysis.一种基于共功能分组的方法,用于注释单细胞 RNA-seq 分析中非冗余特征基因选择。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad042.
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
4
Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis.单细胞 RNA-seq 分析中降维方法的准确性、鲁棒性和可扩展性。
Genome Biol. 2019 Dec 10;20(1):269. doi: 10.1186/s13059-019-1898-6.
5
From Louvain to Leiden: guaranteeing well-connected communities.从鲁汶到莱顿:保障互联互通的社区。
Sci Rep. 2019 Mar 26;9(1):5233. doi: 10.1038/s41598-019-41695-z.
6
The single-cell transcriptional landscape of mammalian organogenesis.哺乳动物器官发生的单细胞转录组图谱。
Nature. 2019 Feb;566(7745):496-502. doi: 10.1038/s41586-019-0969-x. Epub 2019 Feb 20.
7
Challenges in unsupervised clustering of single-cell RNA-seq data.无监督单细胞 RNA-seq 数据聚类的挑战。
Nat Rev Genet. 2019 May;20(5):273-282. doi: 10.1038/s41576-018-0088-9.
8
SCANPY: large-scale single-cell gene expression data analysis.SCANPY:大规模单细胞基因表达数据分析。
Genome Biol. 2018 Feb 6;19(1):15. doi: 10.1186/s13059-017-1382-0.
9
SC3: consensus clustering of single-cell RNA-seq data.SC3:单细胞RNA测序数据的一致性聚类
Nat Methods. 2017 May;14(5):483-486. doi: 10.1038/nmeth.4236. Epub 2017 Mar 27.