Hozumi Yuta, Wei Guo-Wei
Department of Mathematics, Michigan State University, East Lansing, Michigan, United States of America.
Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan, United States of America.
PLoS One. 2024 Dec 13;19(12):e0311791. doi: 10.1371/journal.pone.0311791. eCollection 2024.
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Correlated clustering and projection (CCP) was recently introduced as an effective method for preprocessing scRNA-seq data. CCP utilizes gene-gene correlations to partition the genes and, based on the partition, employs cell-cell interactions to obtain super-genes. Because CCP is a data-domain approach that does not require matrix diagonalization, it can be used in many downstream machine learning tasks. In this work, we utilize CCP as an initialization tool for uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (tSNE). By using 21 publicly available datasets, we have found that CCP significantly improves UMAP and tSNE visualization and dramatically improve their accuracy. More specifically, CCP improves UMAP by 22% in ARI, 14% in NMI and 15% in ECM, and improves tSNE by 11% in ARI, 9% in NMI and 8% in ECM.
单细胞RNA测序(scRNA-seq)被广泛用于揭示细胞的异质性,这使我们对细胞间通讯、细胞分化和基因表达差异有了深入了解。然而,由于数据稀疏性和涉及的基因数量众多,分析scRNA-seq数据是一项挑战。因此,降维和特征选择对于去除虚假信号和增强下游分析很重要。相关聚类和投影(CCP)最近被引入作为预处理scRNA-seq数据的有效方法。CCP利用基因-基因相关性对基因进行划分,并基于该划分,利用细胞-细胞相互作用获得超级基因。由于CCP是一种不需要矩阵对角化的数据域方法,它可用于许多下游机器学习任务。在这项工作中,我们将CCP用作均匀流形近似和投影(UMAP)以及t分布随机邻域嵌入(tSNE)的初始化工具。通过使用21个公开可用的数据集,我们发现CCP显著改善了UMAP和tSNE的可视化效果,并大幅提高了它们的准确性。更具体地说,CCP在ARI中使UMAP提高了22%,在NMI中提高了14%,在ECM中提高了15%,在ARI中使tSNE提高了11%,在NMI中提高了9%,在ECM中提高了8%。