CTEC：一种用于单细胞 RNA 测序数据分析的交叉制表集成聚类方法。

CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis.

机构信息

AI Lab, Shenzhen 518054, China.

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, 999077, China.

出版信息

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae130.

DOI:10.1093/bioinformatics/btae130

PMID:38552307

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10985676/

Abstract

MOTIVATION

Cell-type clustering is a crucial first step for single-cell RNA-seq data analysis. However, existing clustering methods often provide different results on cluster assignments with respect to their own data pre-processing, choice of distance metrics, and strategies of feature extraction, thereby limiting their practical applications.

RESULTS

We propose Cross-Tabulation Ensemble Clustering (CTEC) method that formulates two re-clustering strategies (distribution- and outlier-based) via cross-tabulation. Benchmarking experiments on five scRNA-Seq datasets illustrate that the proposed CTEC method offers significant improvements over the individual clustering methods. Moreover, CTEC-DB outperforms the state-of-the-art ensemble methods for single-cell data clustering, with 45.4% and 17.1% improvement over the single-cell aggregated from ensemble clustering method (SAFE) and the single-cell aggregated clustering via Mixture model ensemble method (SAME), respectively, on the two-method ensemble test.

AVAILABILITY AND IMPLEMENTATION

The source code of the benchmark in this work is available at the GitHub repository https://github.com/LWCHN/CTEC.git.

摘要

动机

细胞类型聚类是单细胞 RNA-seq 数据分析的关键第一步。然而，现有的聚类方法在其数据预处理、距离度量选择和特征提取策略方面往往会提供不同的聚类结果，从而限制了它们的实际应用。

结果

我们提出了 Cross-Tabulation Ensemble Clustering（CTEC）方法，通过交叉制表形成了两种重新聚类策略（基于分布和基于离群值的策略）。在五个 scRNA-Seq 数据集上的基准实验表明，所提出的 CTEC 方法在单个聚类方法上有显著的改进。此外，CTEC-DB 在单细胞数据聚类方面优于最先进的集成方法，在两种方法的集成测试中，与基于集成聚类的单细胞聚合方法（SAFE）相比，分别提高了 45.4%和 17.1%，与基于混合模型集成方法的单细胞聚合聚类（SAME）相比，分别提高了 45.4%和 17.1%。

可用性和实现

本工作中的基准测试的源代码可在 GitHub 存储库 https://github.com/LWCHN/CTEC.git 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25e8/10985676/0cae136ba0d2/btae130f1.jpg

相似文献

CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae130.

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

SCMcluster: a high-precision cell clustering algorithm integrating marker gene set with single-cell RNA sequencing data.

Brief Funct Genomics. 2023 Jul 17;22(4):329-340. doi: 10.1093/bfgp/elad004.

SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data.

Bioinformatics. 2019 Apr 15;35(8):1269-1277. doi: 10.1093/bioinformatics/bty793.

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.

BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.

SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble.

Nucleic Acids Res. 2020 Jan 10;48(1):86-95. doi: 10.1093/nar/gkz959.

scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and Dirichlet process mixture model.

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae198.

scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab588.

Machine learning and statistical methods for clustering single-cell RNA-sequencing data.

Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063.

scGAC: a graph attentional architecture for clustering single-cell RNA-seq data.

Bioinformatics. 2022 Apr 12;38(8):2187-2193. doi: 10.1093/bioinformatics/btac099.

引用本文的文献

RGCN-BA: relational graph convolutional network with batch awareness for single-cell RNA sequencing clustering.

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf378.

scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder.

Res Sq. 2025 Mar 19:rs.3.rs-6081478. doi: 10.21203/rs.3.rs-6081478/v1.

scMEDAL for the interpretable analysis of single-cell transcriptomics data with batch effect visualization using a deep mixed effects autoencoder.

ArXiv. 2025 Mar 13:arXiv:2411.06635v3.

scDRMAE: integrating masked autoencoder with residual attention networks to leverage omics feature dependencies for accurate cell clustering.

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae599.

Evolutionary Mechanism Based Conserved Gene Expression Biclustering Module Analysis for Breast Cancer Genomics.

Biomedicines. 2024 Sep 12;12(9):2086. doi: 10.3390/biomedicines12092086.

本文引用的文献

SC3s: efficient scaling of single cell consensus clustering to millions of cells.

BMC Bioinformatics. 2022 Dec 12;23(1):536. doi: 10.1186/s12859-022-05085-z.

Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data.

PLoS Comput Biol. 2022 Dec 5;18(12):e1010753. doi: 10.1371/journal.pcbi.1010753. eCollection 2022 Dec.

Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data.

Nat Commun. 2022 Aug 8;13(1):4616. doi: 10.1038/s41467-022-32097-3.

A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics.

Genome Res. 2021 Oct;31(10):1753-1766. doi: 10.1101/gr.271874.120. Epub 2021 May 25.

COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas.

Cell. 2021 Apr 1;184(7):1895-1913.e19. doi: 10.1016/j.cell.2021.01.053. Epub 2021 Feb 3.

On the Robustness of Graph-Based Clustering to Random Network Alterations.

Mol Cell Proteomics. 2021;20:100002. doi: 10.1074/mcp.RA120.002275. Epub 2020 Nov 24.

Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq.

Nat Methods. 2020 Aug;17(8):793-798. doi: 10.1038/s41592-020-0905-x. Epub 2020 Jul 27.

Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis.

Nat Commun. 2020 May 11;11(1):2338. doi: 10.1038/s41467-020-15851-3.

SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble.

Nucleic Acids Res. 2020 Jan 10;48(1):86-95. doi: 10.1093/nar/gkz959.

From Louvain to Leiden: guaranteeing well-connected communities.

Sci Rep. 2019 Mar 26;9(1):5233. doi: 10.1038/s41598-019-41695-z.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CTEC：一种用于单细胞 RNA 测序数据分析的交叉制表集成聚类方法。

CTEC: a cross-tabulation ensemble clustering approach for single-cell RNA sequencing data analysis.

机构信息

AI Lab, Shenzhen 518054, China.

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, 999077, China.