School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
School of Computer Science and Technology, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China.
Int J Mol Sci. 2022 Mar 31;23(7):3900. doi: 10.3390/ijms23073900.
Single cell RNA sequencing (scRNA-seq) allows researchers to explore tissue heterogeneity, distinguish unusual cell identities, and find novel cellular subtypes by providing transcriptome profiling for individual cells. Clustering analysis is usually used to predict cell class assignments and infer cell identities. However, the performance of existing single-cell clustering methods is extremely sensitive to the presence of noise data and outliers. Existing clustering algorithms can easily fall into local optimal solutions. There is still no consensus on the best performing method. To address this issue, we introduce a single cell self-paced clustering (scSPaC) method with F-norm based nonnegative matrix factorization (NMF) for scRNA-seq data and a sparse single cell self-paced clustering (sscSPaC) method with l21-norm based nonnegative matrix factorization for scRNA-seq data. We gradually add single cells from simple to complex to our model until all cells are selected. In this way, the influences of noisy data and outliers can be significantly reduced. The proposed method achieved the best performance on both simulation data and real scRNA-seq data. A case study about human clara cells and ependymal cells scRNA-seq data clustering shows that scSPaC is more advantageous near the clustering dividing line.
单细胞 RNA 测序 (scRNA-seq) 通过为单个细胞提供转录组谱分析,允许研究人员探索组织异质性、区分异常细胞身份和发现新的细胞亚型。聚类分析通常用于预测细胞类别分配和推断细胞身份。然而,现有的单细胞聚类方法的性能对噪声数据和异常值的存在非常敏感。现有的聚类算法很容易陷入局部最优解。目前还没有关于表现最好的方法的共识。为了解决这个问题,我们引入了一种基于 F 范数的基于非负矩阵分解 (NMF) 的单细胞自定步聚类 (scSPaC) 方法和一种基于 l21 范数的基于非负矩阵分解的稀疏单细胞自定步聚类 (sscSPaC) 方法,用于 scRNA-seq 数据。我们逐渐将单细胞从简单添加到复杂,直到选择所有细胞。通过这种方式,可以显著降低噪声数据和异常值的影响。该方法在模拟数据和真实 scRNA-seq 数据上均取得了最佳性能。一项关于人克拉拉细胞和室管膜细胞 scRNA-seq 数据聚类的案例研究表明,scSPaC 在聚类分界线附近更具优势。