Suppr超能文献

K 近邻诱导拓扑主成分分析在单细胞 RNA 测序数据分析中的应用。

K-nearest-neighbors induced topological PCA for single cell RNA-sequence data analysis.

机构信息

Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA.

Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Comput Biol Med. 2024 Jun;175:108497. doi: 10.1016/j.compbiomed.2024.108497. Epub 2024 Apr 24.

Abstract

Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins. For example, tPCA provides up to 628%, 78%, and 149% improvements to UMAP, tSNE, and NMF, respectively on classification in the F1 metric, and kNN-tPCA offers 53%, 63%, and 32% improvements to UMAP, tSNE, and NMF, respectively on clustering in the ARI metric.

摘要

单细胞 RNA 测序 (scRNA-seq) 被广泛用于揭示细胞异质性,使我们能够深入了解细胞间通讯、细胞分化和差异基因表达。然而,由于稀疏性和涉及的大量基因,分析 scRNA-seq 数据是一项挑战。因此,降维和特征选择对于去除虚假信号和增强下游分析非常重要。传统的主成分分析 (PCA) 作为降维的主要工具,缺乏捕获数据中嵌入的几何结构信息的能力,而以前的图拉普拉斯正则化受到仅分析单一尺度的限制。我们提出了一种拓扑主成分分析 (tPCA) 方法,通过结合持久拉普拉斯 (PL) 技术和 L 范数正则化来解决数据中的多尺度和多类异质性问题。我们进一步引入了 k-最近邻 (kNN) 持久拉普拉斯技术来提高我们的持久拉普拉斯方法的鲁棒性。所提出的 kNN-PL 是一种新的代数拓扑技术,解决了传统持久同调的许多限制。我们不是通过改变距离阈值来诱导滤波,而是引入了 kNN-tPCA,其中在每个步骤中通过改变 kNN 网络中的邻居数量来实现滤波,并发现该框架对超参数调整具有重要意义。我们在 11 个不同的 scRNA-seq 基准数据集上验证了我们提出的 tPCA 和 kNN-tPCA 方法的有效性,并展示了我们的方法优于文献中其他无监督 PCA 增强方法,以及流行的一致流形逼近 (UMAP)、t 分布随机近邻嵌入 (tSNE) 和非负矩阵分解 (NMF),在 F1 度量的分类方面,tPCA 分别提供了高达 628%、78%和 149%的改进,kNN-tPCA 分别在 ARI 度量的聚类方面提供了 53%、63%和 32%的改进。例如,tPCA 在 F1 度量的分类方面提供了高达 628%、78%和 149%的改进,kNN-tPCA 在 ARI 度量的聚类方面提供了 53%、63%和 32%的改进。

相似文献

3
Analyzing scRNA-seq data by CCP-assisted UMAP and tSNE.通过CCP辅助的UMAP和tSNE分析单细胞RNA测序数据。
PLoS One. 2024 Dec 13;19(12):e0311791. doi: 10.1371/journal.pone.0311791. eCollection 2024.
6
Dimensionality Reduction of Single-Cell RNA-Seq Data.单细胞 RNA-Seq 数据的降维处理。
Methods Mol Biol. 2021;2284:331-342. doi: 10.1007/978-1-0716-1307-8_18.
7
Computational solutions for spatial transcriptomics.空间转录组学的计算解决方案。
Comput Struct Biotechnol J. 2022 Sep 1;20:4870-4884. doi: 10.1016/j.csbj.2022.08.043. eCollection 2022.

本文引用的文献

3
Persistent spectral theory-guided protein engineering.持久光谱理论指导的蛋白质工程。
Nat Comput Sci. 2023 Feb;3(2):149-163. doi: 10.1038/s43588-022-00394-y. Epub 2023 Feb 20.
10
EVOLUTIONARY DE RHAM-HODGE METHOD.演化德拉姆 - 霍奇方法。
Discrete Continuous Dyn Syst Ser B. 2021 Jul;26(7):3785-3821. doi: 10.3934/dcdsb.2020257.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验