K 近邻诱导拓扑主成分分析在单细胞 RNA 测序数据分析中的应用。

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.

机构信息

Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.

Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24.

DOI:10.1016/j.compbiomed.2024.108497

PMID:38678944

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11090715/

Abstract

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.

摘要

单细胞 RNA 测序 (scRNA-seq) 被广泛用于揭示细胞异质性，使我们能够深入了解细胞间通讯、细胞分化和差异基因表达。然而，由于稀疏性和涉及的大量基因，分析 scRNA-seq 数据是一项挑战。因此，降维和特征选择对于去除虚假信号和增强下游分析非常重要。传统的主成分分析 (PCA) 作为降维的主要工具，缺乏捕获数据中嵌入的几何结构信息的能力，而以前的图拉普拉斯正则化受到仅分析单一尺度的限制。我们提出了一种拓扑主成分分析 (tPCA) 方法，通过结合持久拉普拉斯 (PL) 技术和 L 范数正则化来解决数据中的多尺度和多类异质性问题。我们进一步引入了 k-最近邻 (kNN) 持久拉普拉斯技术来提高我们的持久拉普拉斯方法的鲁棒性。所提出的 kNN-PL 是一种新的代数拓扑技术，解决了传统持久同调的许多限制。我们不是通过改变距离阈值来诱导滤波，而是引入了 kNN-tPCA，其中在每个步骤中通过改变 kNN 网络中的邻居数量来实现滤波，并发现该框架对超参数调整具有重要意义。我们在 11 个不同的 scRNA-seq 基准数据集上验证了我们提出的 tPCA 和 kNN-tPCA 方法的有效性，并展示了我们的方法优于文献中其他无监督 PCA 增强方法，以及流行的一致流形逼近 (UMAP)、t 分布随机近邻嵌入 (tSNE) 和非负矩阵分解 (NMF)，在 F1 度量的分类方面，tPCA 分别提供了高达 628%、78%和 149%的改进，kNN-tPCA 分别在 ARI 度量的聚类方面提供了 53%、63%和 32%的改进。例如，tPCA 在 F1 度量的分类方面提供了高达 628%、78%和 149%的改进，kNN-tPCA 在 ARI 度量的聚类方面提供了 53%、63%和 32%的改进。

相似文献

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.K 近邻诱导拓扑主成分分析在单细胞 RNA 测序数据分析中的应用。

Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24.

K-Nearest-Neighbors Induced Topological PCA for Single Cell RNA-Sequence Data Analysis.用于单细胞RNA序列数据分析的K近邻诱导拓扑主成分分析

ArXiv. 2023 Oct 23:arXiv:2310.14521v1.

Analyzing scRNA-seq data by CCP-assisted UMAP and tSNE.通过CCP辅助的UMAP和tSNE分析单细胞RNA测序数据。

PLoS One. 2024 Dec 13;19(12):e0311791. doi: 10.1371/journal.pone.0311791. eCollection 2024.

Analyzing Single Cell RNA Sequencing with Topological Nonnegative Matrix Factorization.使用拓扑非负矩阵分解分析单细胞RNA测序

J Comput Appl Math. 2024 Aug 1;445. doi: 10.1016/j.cam.2024.115842. Epub 2024 Feb 19.

Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection.单细胞 RNA 测序数据的相关聚类和投影预处理。

J Chem Inf Model. 2024 Apr 8;64(7):2829-2838. doi: 10.1021/acs.jcim.3c00674. Epub 2023 Jul 4.

Dimensionality Reduction of Single-Cell RNA-Seq Data.单细胞 RNA-Seq 数据的降维处理。

Methods Mol Biol. 2021;2284:331-342. doi: 10.1007/978-1-0716-1307-8_18.

Computational solutions for spatial transcriptomics.空间转录组学的计算解决方案。

Comput Struct Biotechnol J. 2022 Sep 1;20:4870-4884. doi: 10.1016/j.csbj.2022.08.043. eCollection 2022.

Visualizing Single-Cell RNA-seq Data with Semisupervised Principal Component Analysis.基于半监督主成分分析的单细胞 RNA-seq 数据可视化

Int J Mol Sci. 2020 Aug 12;21(16):5797. doi: 10.3390/ijms21165797.

Single-cell RNA sequencing data analysis based on non-uniform ε-neighborhood network.基于非均匀ε-邻域网络的单细胞RNA测序数据分析

Bioinformatics. 2022 Apr 28;38(9):2459-2465. doi: 10.1093/bioinformatics/btac114.

Multi-level multi-view network based on structural contrastive learning for scRNA-seq data clustering.基于结构对比学习的多层次多视图网络用于 scRNA-seq 数据聚类。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae562.

引用本文的文献

Dimensionality reduction for k-means clustering of large-scale influenza mutation datasets.用于大规模流感突变数据集k均值聚类的降维方法

ArXiv. 2025 Apr 4:arXiv:2504.03550v1.

Developing and validating a machine learning model to predict multidrug-resistant -related septic shock.开发并验证一个用于预测多重耐药相关感染性休克的机器学习模型。

Front Immunol. 2025 Jan 10;15:1539465. doi: 10.3389/fimmu.2024.1539465. eCollection 2024.

Multiscale Cell-Cell Interactive Spatial Transcriptomics Analysis.多尺度细胞间相互作用空间转录组学分析

Res Sq. 2025 Jan 3:rs.3.rs-5743704. doi: 10.21203/rs.3.rs-5743704/v1.

Multiscale differential geometry learning of networks with applications to single-cell RNA sequencing data.基于网络的多尺度微分几何学习及其在单细胞 RNA 测序数据分析中的应用。

Comput Biol Med. 2024 Mar;171:108211. doi: 10.1016/j.compbiomed.2024.108211. Epub 2024 Feb 28.

本文引用的文献

PERSISTENT HYPERDIGRAPH HOMOLOGY AND PERSISTENT HYPERDIGRAPH LAPLACIANS.持久超图同调与持久超图拉普拉斯算子

Found Data Sci. 2023 Dec;5(4):558-588. doi: 10.3934/fods.2023010.

PLPCA: Persistent Laplacian-Enhanced PCA for Microarray Data Analysis.PLPCA：用于微阵列数据分析的持久拉普拉斯增强主成分分析。

J Chem Inf Model. 2024 Apr 8;64(7):2405-2420. doi: 10.1021/acs.jcim.3c01023. Epub 2023 Sep 22.

Persistent spectral theory-guided protein engineering.持久光谱理论指导的蛋白质工程。

Nat Comput Sci. 2023 Feb;3(2):149-163. doi: 10.1038/s43588-022-00394-y. Epub 2023 Feb 20.

Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection.单细胞 RNA 测序数据的相关聚类和投影预处理。

J Chem Inf Model. 2024 Apr 8;64(7):2829-2838. doi: 10.1021/acs.jcim.3c00674. Epub 2023 Jul 4.

Robust Graph Regularized NMF with Dissimilarity and Similarity Constraints for ScRNA-seq Data Clustering.具有差异和相似性约束的鲁棒图正则化非负矩阵分解用于单细胞RNA测序数据聚类

J Chem Inf Model. 2022 Dec 12;62(23):6271-6286. doi: 10.1021/acs.jcim.2c01305. Epub 2022 Dec 2.

Persistent Laplacian projected Omicron BA.4 and BA.5 to become new dominating variants.持续的拉普拉斯投影奥密克戎 BA.4 和 BA.5 成为新的优势变体。

Comput Biol Med. 2022 Dec;151(Pt A):106262. doi: 10.1016/j.compbiomed.2022.106262. Epub 2022 Nov 2.

Enhancing Characteristic Gene Selection and Tumor Classification by the Robust Laplacian Supervised Discriminative Sparse PCA.基于鲁棒拉普拉斯监督判别稀疏 PCA 的特征基因选择与肿瘤分类

J Chem Inf Model. 2022 Apr 11;62(7):1794-1807. doi: 10.1021/acs.jcim.1c01403. Epub 2022 Mar 30.

Statistics or biology: the zero-inflation controversy about scRNA-seq data.统计学还是生物学：关于 scRNA-seq 数据的零膨胀争议。

Genome Biol. 2022 Jan 21;23(1):31. doi: 10.1186/s13059-022-02601-5.

Deep learning tackles single-cell analysis-a survey of deep learning for scRNA-seq analysis.深度学习应对单细胞分析——深度学习在 scRNA-seq 分析中的应用综述。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab531.

EVOLUTIONARY DE RHAM-HODGE METHOD.演化德拉姆 - 霍奇方法。

Discrete Continuous Dyn Syst Ser B. 2021 Jul;26(7):3785-3821. doi: 10.3934/dcdsb.2020257.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验