降维与聚类模型在单细胞 RNA 测序数据中的应用：一项比较研究。

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.

机构信息

Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Zhuhai Sub Laboratory of Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Zhuhai College of Jilin University, Zhuhai 519041, China.

出版信息

Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181.

DOI:10.3390/ijms21062181

PMID:32235704

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7139673/

Abstract

With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.

摘要

随着单细胞 RNA 测序技术的不断发展，已经产生了大量的转录组数据集。这些数据集进一步加深了我们对同质群体中细胞异质性及其潜在机制的理解。单细胞 RNA 测序（scRNA-seq）数据聚类可以根据基因表达中嵌入的模式将属于同一细胞类型的细胞进行分组。然而，由于现有 scRNA-seq 技术的限制，scRNA-seq 数据具有高度的维数、噪声和稀疏性。传统的聚类方法对于高维和稀疏矩阵计算效率不高。因此，引入了几种降维方法。为了验证可靠和标准的研究流程，我们对四种经典降维方法和五种聚类模型进行了全面的回顾和评估。在两个大型 scRNA-seq 数据集上进行了四个实验，共使用了 20 个模型。结果表明，特征选择方法对高维稀疏 scRNA-seq 数据有积极的贡献。此外，特征提取方法能够促进聚类性能，尽管并非一成不变。独立成分分析（ICA）在那些小的压缩特征空间中表现良好，而主成分分析比所有其他特征提取方法都更稳定。此外，ICA 并不适合模糊 C 均值聚类在 scRNA-seq 数据分析中的应用。K-means 聚类与特征提取方法相结合可以取得良好的效果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/df07/7139673/67627ae4655c/ijms-21-02181-g001.jpg

相似文献

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.降维与聚类模型在单细胞 RNA 测序数据中的应用：一项比较研究。

Int J Mol Sci. 2020 Mar 22;21(6):2181. doi: 10.3390/ijms21062181.

scDSSC: Deep Sparse Subspace Clustering for scRNA-seq Data.scDSSC：用于 scRNA-seq 数据的深度稀疏子空间聚类。

PLoS Comput Biol. 2022 Dec 19;18(12):e1010772. doi: 10.1371/journal.pcbi.1010772. eCollection 2022 Dec.

Joint learning dimension reduction and clustering of single-cell RNA-sequencing data.单细胞 RNA 测序数据的联合降维和聚类学习。

Bioinformatics. 2020 Jun 1;36(12):3825-3832. doi: 10.1093/bioinformatics/btaa231.

Deep enhanced constraint clustering based on contrastive learning for scRNA-seq data.基于对比学习的深度增强约束聚类算法在单细胞 RNA-seq 数据分析中的应用。

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad222.

scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation.scHFC：一种基于自然计算优化的单细胞 RNA-seq 数据的混合模糊聚类方法。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab588.

jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data.jSRC：一种用于单细胞 RNA-seq 数据聚类的灵活准确的联合学习算法。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa433.

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.scBGEDA：基于双分图集成分聚类的对偶去噪自动编码器的单细胞聚类分析。

Bioinformatics. 2023 Feb 14;39(2). doi: 10.1093/bioinformatics/btad075.

Machine learning and statistical methods for clustering single-cell RNA-sequencing data.机器学习和统计方法在单细胞 RNA 测序数据分析中的应用。

Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063.

JLONMFSC: Clustering scRNA-seq data based on joint learning of non-negative matrix factorization and subspace clustering.JLONMFSC：基于非负矩阵分解和子空间聚类联合学习的 scRNA-seq 数据聚类。

Methods. 2024 Feb;222:1-9. doi: 10.1016/j.ymeth.2023.11.019. Epub 2023 Dec 19.

Multi-View Clustering With Graph Learning for scRNA-Seq Data.基于图学习的 scRNA-Seq 数据的多视图聚类。

IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3535-3546. doi: 10.1109/TCBB.2023.3298334. Epub 2023 Dec 25.

引用本文的文献

Enabling scalable single-cell transcriptomic analysis through distributed computing with Apache spark.通过使用Apache Spark进行分布式计算实现可扩展的单细胞转录组分析。

Sci Rep. 2025 Jul 29;15(1):27713. doi: 10.1038/s41598-025-12897-5.

Evaluating discrepancies in dimensionality reduction for time-series single-cell RNA-sequencing data.评估时间序列单细胞RNA测序数据降维中的差异。

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf287.

Infusing structural assumptions into dimensionality reduction for single-cell RNA sequencing data to identify small gene sets.将结构假设融入单细胞RNA测序数据的降维过程中以识别小基因集。

Commun Biol. 2025 Mar 11;8(1):414. doi: 10.1038/s42003-025-07872-9.

Interpreting single-cell and spatial omics data using deep neural network training dynamics.利用深度神经网络训练动力学解释单细胞和空间组学数据。

Nat Comput Sci. 2024 Dec;4(12):941-954. doi: 10.1038/s43588-024-00721-5. Epub 2024 Dec 4.

Joint representation and visualization of derailed cell states with Decipher.使用Decipher对脱轨细胞状态进行联合表示和可视化。

bioRxiv. 2024 Nov 5:2023.11.11.566719. doi: 10.1101/2023.11.11.566719.

An introduction to representation learning for single-cell data analysis.单细胞数据分析的表示学习简介。

Cell Rep Methods. 2023 Aug 2;3(8):100547. doi: 10.1016/j.crmeth.2023.100547. eCollection 2023 Aug 28.

Application of single-cell RNA sequencing on human testicular samples: a comprehensive review.单细胞 RNA 测序在人类睾丸样本中的应用：全面综述。

Int J Biol Sci. 2023 Apr 9;19(7):2167-2197. doi: 10.7150/ijbs.82191. eCollection 2023.

scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing.scRNASequest：一个用于 scRNA-seq 分析、可视化和发布的生态系统。

BMC Genomics. 2023 May 2;24(1):228. doi: 10.1186/s12864-023-09332-2.

Epi-Impute: Single-Cell RNA-seq Imputation via Integration with Single-Cell ATAC-seq.Epi-Impute：通过与单细胞 ATAC-seq 整合进行单细胞 RNA-seq 插补。

Int J Mol Sci. 2023 Mar 25;24(7):6229. doi: 10.3390/ijms24076229.

ANPELA: Significantly Enhanced Quantification Tool for Cytometry-Based Single-Cell Proteomics.ANPELA：基于流式细胞术的单细胞蛋白质组学的显著增强定量工具。

Adv Sci (Weinh). 2023 May;10(15):e2207061. doi: 10.1002/advs.202207061. Epub 2023 Mar 22.

本文引用的文献

Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.基于单细胞表达谱的多目标优化模糊聚类检测细胞簇。

Genes (Basel). 2019 Aug 13;10(8):611. doi: 10.3390/genes10080611.

Data imbalance in CRISPR off-target prediction.CRISPR 脱靶预测中的数据不平衡。

Brief Bioinform. 2020 Jul 15;21(4):1448-1454. doi: 10.1093/bib/bbz069.

Unified single-cell analysis of testis gene regulation and pathology in five mouse strains.五种小鼠品系睾丸基因调控和病理的单细胞分析。

Elife. 2019 Jun 25;8:e43966. doi: 10.7554/eLife.43966.

Ultrafast clustering of single-cell flow cytometry data using FlowGrid.使用FlowGrid对单细胞流式细胞术数据进行超快速聚类。

BMC Syst Biol. 2019 Apr 5;13(Suppl 2):35. doi: 10.1186/s12918-019-0690-2.

Single-cell transcriptome analysis of Physcomitrella leaf cells during reprogramming using microcapillary manipulation.利用微吸管操作对Physcomitrella 叶细胞进行重编程过程中的单细胞转录组分析。

Nucleic Acids Res. 2019 May 21;47(9):4539-4553. doi: 10.1093/nar/gkz181.

An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics.单细胞转录组学和深度组织蛋白质组学描绘的衰老肺部图谱。

Nat Commun. 2019 Feb 27;10(1):963. doi: 10.1038/s41467-019-08831-9.

Challenges in unsupervised clustering of single-cell RNA-seq data.无监督单细胞 RNA-seq 数据聚类的挑战。

Nat Rev Genet. 2019 May;20(5):273-282. doi: 10.1038/s41576-018-0088-9.

Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment.使用计算方法鉴定必需基因的综合综述：聚焦于特征实现与评估

Brief Bioinform. 2020 Jan 17;21(1):171-181. doi: 10.1093/bib/bby116.

Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing.单细胞测序对非小细胞肺癌 T 细胞的全面刻画。

Nat Med. 2018 Jul;24(7):978-985. doi: 10.1038/s41591-018-0045-3. Epub 2018 Jun 25.

Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data.用于校正单细胞基因表达数据中技术变异的狄利克雷过程混合模型

JMLR Workshop Conf Proc. 2016;48:1070-1079.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

降维与聚类模型在单细胞 RNA 测序数据中的应用：一项比较研究。

Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献