IEEE J Biomed Health Inform. 2022 Mar;26(3):1394-1405. doi: 10.1109/JBHI.2021.3099127. Epub 2022 Mar 7.
Fast-developing single-cell technologies create unprecedented opportunities to reveal cell heterogeneity and diversity. Accurate classification of single cells is a critical prerequisite for recovering the mechanisms of heterogeneity. However, the scRNA-seq profiles we obtained at present have high dimensionality, sparsity, and noise, which pose challenges for existing clustering methods in grouping cells that belong to the same subpopulation based on transcriptomic profiles. Although many computational methods have been proposed developing novel and effective computational methods to accurately identify cell types remains a considerable challenge. We present a new computational framework to identify cell types by integrating low-rank representation (LRR) and nonnegative matrix factorization (NMF); this framework is named NMFLRR. The LRR captures the global properties of original data by using nuclear norms, and a locality constrained graph regularization term is introduced to characterize the data's local geometric information. The similarity matrix and low-dimensional features of data can be simultaneously obtained by applying the alternating direction method of multipliers (ADMM) algorithm to handle each variable alternatively in an iterative way. We finally obtained the predicted cell types by using a spectral algorithm based on the optimized similarity matrix. Nine real scRNA-seq datasets were used to test the performance of NMFLRR and fifteen other competitive methods, and the accuracy and robustness of the simulation results suggest the NMFLRR is a promising algorithm for the classification of single cells. The simulation code is freely available at: https://github.com/wzhangwhu/NMFLRR_code.
单细胞技术的快速发展为揭示细胞异质性和多样性创造了前所未有的机会。准确的单细胞分类是恢复异质性机制的关键前提。然而,我们目前获得的 scRNA-seq 图谱具有高维性、稀疏性和噪声,这对基于转录组图谱对属于同一亚群的细胞进行分组的现有聚类方法提出了挑战。尽管已经提出了许多计算方法来开发新颖有效的计算方法,但准确识别细胞类型仍然是一个相当大的挑战。我们提出了一种新的计算框架,通过整合低秩表示(LRR)和非负矩阵分解(NMF)来识别细胞类型;该框架命名为 NMFLRR。LRR 通过使用核范数来捕捉原始数据的全局特性,并引入局部约束图正则化项来描述数据的局部几何信息。通过交替方向乘子法(ADMM)算法交替处理每个变量,可以同时获得相似性矩阵和数据的低维特征。我们最后通过使用基于优化相似性矩阵的谱算法获得预测的细胞类型。九个真实的 scRNA-seq 数据集被用于测试 NMFLRR 和其他十五种竞争方法的性能,模拟结果的准确性和稳健性表明 NMFLRR 是一种有前途的单细胞分类算法。模拟代码可在以下网址免费获取:https://github.com/wzhangwhu/NMFLRR_code。