• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CDSKNN:一种基于稳定图结构的大规模单细胞数据新型聚类框架。

CDSKNN: a novel clustering framework for large-scale single-cell data based on a stable graph structure.

作者信息

Ren Jun, Lyu Xuejing, Guo Jintao, Shi Xiaodong, Zhou Ying, Li Qiyuan

机构信息

School of Informatics, Xiamen University, Xiamen, 361105, China.

Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, 361102, China.

出版信息

J Transl Med. 2024 Mar 3;22(1):233. doi: 10.1186/s12967-024-05009-w.

DOI:10.1186/s12967-024-05009-w
PMID:38433205
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10910752/
Abstract

BACKGROUND

Accurate and efficient cell grouping is essential for analyzing single-cell transcriptome sequencing (scRNA-seq) data. However, the existing clustering techniques often struggle to provide timely and accurate cell type groupings when dealing with datasets with large-scale or imbalanced cell types. Therefore, there is a need for improved methods that can handle the increasing size of scRNA-seq datasets while maintaining high accuracy and efficiency.

METHODS

We propose CDSKNN (Community Detection based on a Stable K-Nearest Neighbor Graph Structure), a novel single-cell clustering framework integrating partition clustering algorithm and community detection algorithm, which achieves accurate and fast cell type grouping by finding a stable graph structure.

RESULTS

We evaluated the effectiveness of our approach by analyzing 15 tissues from the human fetal atlas. Compared to existing methods, CDSKNN effectively counteracts the high imbalance in single-cell data, enabling effective clustering. Furthermore, we conducted comparisons across multiple single-cell datasets from different studies and sequencing techniques. CDSKNN is of high applicability and robustness, and capable of balancing the complexities of across diverse types of data. Most importantly, CDSKNN exhibits higher operational efficiency on datasets at the million-cell scale, requiring an average of only 6.33 min for clustering 1.46 million single cells, saving 33.3% to 99% of running time compared to those of existing methods.

CONCLUSIONS

The CDSKNN is a flexible, resilient, and promising clustering tool that is particularly suitable for clustering imbalanced data and demonstrates high efficiency on large-scale scRNA-seq datasets.

摘要

背景

准确且高效的细胞分组对于分析单细胞转录组测序(scRNA-seq)数据至关重要。然而,现有的聚类技术在处理具有大规模或不平衡细胞类型的数据集时,往往难以提供及时且准确的细胞类型分组。因此,需要改进的方法来处理不断增大规模的scRNA-seq数据集,同时保持高精度和高效率。

方法

我们提出了CDSKNN(基于稳定k近邻图结构的社区检测),这是一种整合了划分聚类算法和社区检测算法的新型单细胞聚类框架,通过找到稳定的图结构来实现准确快速的细胞类型分组。

结果

我们通过分析人类胎儿图谱中的15种组织来评估我们方法的有效性。与现有方法相比,CDSKNN有效抵消了单细胞数据中的高度不平衡,实现了有效的聚类。此外,我们对来自不同研究和测序技术的多个单细胞数据集进行了比较。CDSKNN具有高度的适用性和稳健性,能够平衡不同类型数据的复杂性。最重要的是,CDSKNN在百万细胞规模的数据集上展现出更高的运算效率,对146万个单细胞进行聚类平均仅需6.33分钟,与现有方法相比节省了33.3%至99%的运行时间。

结论

CDSKNN是一种灵活、有弹性且有前景的聚类工具,特别适用于对不平衡数据进行聚类,并且在大规模scRNA-seq数据集上表现出高效率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/7d381d89c38c/12967_2024_5009_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/65c51a02b945/12967_2024_5009_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/3cf4725be750/12967_2024_5009_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/18fde5e4a587/12967_2024_5009_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/0528018f409c/12967_2024_5009_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/13a192def062/12967_2024_5009_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/7d381d89c38c/12967_2024_5009_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/65c51a02b945/12967_2024_5009_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/3cf4725be750/12967_2024_5009_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/18fde5e4a587/12967_2024_5009_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/0528018f409c/12967_2024_5009_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/13a192def062/12967_2024_5009_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba72/10910752/7d381d89c38c/12967_2024_5009_Fig6_HTML.jpg

相似文献

1
CDSKNN: a novel clustering framework for large-scale single-cell data based on a stable graph structure.CDSKNN:一种基于稳定图结构的大规模单细胞数据新型聚类框架。
J Transl Med. 2024 Mar 3;22(1):233. doi: 10.1186/s12967-024-05009-w.
2
Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning.基于共享最近邻和图划分的单细胞聚类。
Interdiscip Sci. 2020 Jun;12(2):117-130. doi: 10.1007/s12539-019-00357-4. Epub 2020 Feb 22.
3
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
4
jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data.jSRC:一种用于单细胞 RNA-seq 数据聚类的灵活准确的联合学习算法。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa433.
5
scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA:基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。
Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.
6
scZAG: Integrating ZINB-Based Autoencoder with Adaptive Data Augmentation Graph Contrastive Learning for scRNA-seq Clustering.scZAG:基于 ZINB 的自动编码器与自适应数据增强图对比学习在 scRNA-seq 聚类中的整合。
Int J Mol Sci. 2024 May 29;25(11):5976. doi: 10.3390/ijms25115976.
7
scGCC: Graph Contrastive Clustering With Neighborhood Augmentations for scRNA-Seq Data Analysis.scGCC:基于邻域增强的图对比聚类在 scRNA-Seq 数据分析中的应用。
IEEE J Biomed Health Inform. 2023 Dec;27(12):6133-6143. doi: 10.1109/JBHI.2023.3319551. Epub 2023 Dec 5.
8
CosTaL: an accurate and scalable graph-based clustering algorithm for high-dimensional single-cell data analysis.CosTaL:一种用于高维单细胞数据分析的准确且可扩展的基于图的聚类算法。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad157.
9
JOINT for large-scale single-cell RNA-sequencing analysis via soft-clustering and parallel computing.通过软聚类和并行计算进行大规模单细胞RNA测序分析的JOINT
BMC Genomics. 2021 Jan 11;22(1):47. doi: 10.1186/s12864-020-07302-6.
10
Multi-View Clustering With Graph Learning for scRNA-Seq Data.基于图学习的 scRNA-Seq 数据的多视图聚类。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3535-3546. doi: 10.1109/TCBB.2023.3298334. Epub 2023 Dec 25.

引用本文的文献

1
Comparative analysis of statistical and deep learning-based multi-omics integration for breast cancer subtype classification.基于统计和深度学习的多组学整合用于乳腺癌亚型分类的比较分析
J Transl Med. 2025 Jul 1;23(1):709. doi: 10.1186/s12967-025-06662-5.

本文引用的文献

1
Significance analysis for clustering with single-cell RNA-sequencing data.基于单细胞 RNA-seq 数据的聚类意义分析。
Nat Methods. 2023 Aug;20(8):1196-1202. doi: 10.1038/s41592-023-01933-9. Epub 2023 Jul 10.
2
ProgClust: A progressive clustering method to identify cell populations.ProgClust:一种用于识别细胞群体的渐进聚类方法。
Front Genet. 2023 Apr 6;14:1183099. doi: 10.3389/fgene.2023.1183099. eCollection 2023.
3
Multi-Objective Genetic Algorithm for Cluster Analysis of Single-Cell Transcriptomes.用于单细胞转录组聚类分析的多目标遗传算法
J Pers Med. 2023 Jan 20;13(2):183. doi: 10.3390/jpm13020183.
4
Dimensionality Reduction and Louvain Agglomerative Hierarchical Clustering for Cluster-Specified Frequent Biomarker Discovery in Single-Cell Sequencing Data.用于单细胞测序数据中聚类特定频繁生物标志物发现的降维和Louvain凝聚层次聚类
Front Genet. 2022 Feb 7;13:828479. doi: 10.3389/fgene.2022.828479. eCollection 2022.
5
Temporal modelling using single-cell transcriptomics.基于单细胞转录组学的时间建模。
Nat Rev Genet. 2022 Jun;23(6):355-368. doi: 10.1038/s41576-021-00444-7. Epub 2022 Jan 31.
6
Single-cell RNA analysis reveals the potential risk of organ-specific cell types vulnerable to SARS-CoV-2 infections.单细胞RNA分析揭示了特定器官细胞类型易受SARS-CoV-2感染的潜在风险。
Comput Biol Med. 2022 Jan;140:105092. doi: 10.1016/j.compbiomed.2021.105092. Epub 2021 Nov 29.
7
A transcriptomic atlas of mouse cerebellar cortex comprehensively defines cell types.小鼠小脑皮质转录组图谱全面定义细胞类型。
Nature. 2021 Oct;598(7879):214-219. doi: 10.1038/s41586-021-03220-z. Epub 2021 Oct 6.
8
Spatially organized multicellular immune hubs in human colorectal cancer.人类结直肠癌中有空间组织的多细胞免疫中心。
Cell. 2021 Sep 2;184(18):4734-4752.e20. doi: 10.1016/j.cell.2021.08.003. Epub 2021 Aug 26.
9
FlowGrid enables fast clustering of very large single-cell RNA-seq data.FlowGrid能够对非常大的单细胞RNA测序数据进行快速聚类。
Bioinformatics. 2021 Dec 22;38(1):282-283. doi: 10.1093/bioinformatics/btab521.
10
Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation.全人群单细胞 RNA-seq 分析在多巴胺能神经元分化过程中的应用。
Nat Genet. 2021 Mar;53(3):304-312. doi: 10.1038/s41588-021-00801-6. Epub 2021 Mar 4.