Yu Yue, Zhang Wei, Zheng Xiaoying, Shen Juan, Li Yuanyuan
School of Sciences, East China Jiaotong University, Nanchang, 330013, China.
School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, 430205, China.
Interdiscip Sci. 2025 Sep 2. doi: 10.1007/s12539-025-00762-y.
Single-cell RNA sequencing (scRNA-seq) offers significant opportunities to reveal cellular heterogeneity and diversity. Accurate cell type identification is critical for downstream analyses and understanding the mechanisms of heterogeneity. However, challenges arise from the high dimensionality, sparsity, and noise of scRNA-seq data. While various low-rank representation (LRR)-based clustering methods have been developed, many existing approaches may inaccurately capture relationships or conflate true patterns with noise. To address these limitations, we introduce a novel clustering algorithm that integrates low-rank matrix decomposition with local graph regularization (LRMGC). This approach applies a tri-decomposition strategy to the representation matrix to derive an aligned core matrix, and characterizes the "distance" between cells in a lower-dimensional space through a local manifold regularization term. Rather than relying on the kernel norm of the representation matrix, the Schatten p-norm is applied to the core matrix to robustly learn the similarity matrix against noise and outliers, while maintaining the high-dimensional noisy data's underlying subspace structure for accurate and robust clustering. Additionally, the final similarity matrix is obtained by applying the angular alignment strategy on the similarity matrix. Comprehensive experiments and comparisons with advanced methods on scRNA-seq datasets demonstrate LRMGC's superior performance and reliability in uncovering cell type composition. Furthermore, a variety of downstream analyses, such as marker gene identification, functional enrichment analysis, rare cell recognition, and cell-cell communication, also demonstrate the effectiveness of LRMGC.
单细胞RNA测序(scRNA-seq)为揭示细胞异质性和多样性提供了重要机遇。准确的细胞类型识别对于下游分析和理解异质性机制至关重要。然而,scRNA-seq数据的高维度、稀疏性和噪声带来了挑战。虽然已经开发了各种基于低秩表示(LRR)的聚类方法,但许多现有方法可能无法准确捕捉关系,或者将真实模式与噪声混淆。为了解决这些限制,我们引入了一种新颖的聚类算法,该算法将低秩矩阵分解与局部图正则化(LRMGC)相结合。这种方法对表示矩阵应用三分解策略以导出对齐的核心矩阵,并通过局部流形正则化项在低维空间中表征细胞之间的“距离”。不是依赖于表示矩阵的核范数,而是将Schatten p范数应用于核心矩阵,以稳健地学习针对噪声和离群值的相似性矩阵,同时保持高维噪声数据的潜在子空间结构以进行准确且稳健的聚类。此外,通过对相似性矩阵应用角度对齐策略来获得最终的相似性矩阵。在scRNA-seq数据集上进行的综合实验以及与先进方法的比较证明了LRMGC在揭示细胞类型组成方面的卓越性能和可靠性。此外,各种下游分析,如标记基因识别、功能富集分析、稀有细胞识别和细胞间通信,也证明了LRMGC的有效性。
BMC Bioinformatics. 2025-7-25
J Chem Inf Model. 2025-6-23
Brief Bioinform. 2025-7-2
IEEE/ACM Trans Comput Biol Bioinform. 2024
2025-1
Medicine (Baltimore). 2023-7-21
IEEE J Biomed Health Inform. 2023-5
Bioinformatics. 2022-4-12
Brief Bioinform. 2022-3-10
Innovation (Camb). 2021-7-1