具有差异和相似性约束的鲁棒图正则化非负矩阵分解用于单细胞RNA测序数据聚类
Robust Graph Regularized NMF with Dissimilarity and Similarity Constraints for ScRNA-seq Data Clustering.
作者信息
Shu Zhenqiu, Long Qinghan, Zhang Luping, Yu Zhengtao, Wu Xiao-Jun
机构信息
Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
Library of Kunming Medical University, Kunming 650031, China.
出版信息
J Chem Inf Model. 2022 Dec 12;62(23):6271-6286. doi: 10.1021/acs.jcim.2c01305. Epub 2022 Dec 2.
The notable progress in single-cell RNA sequencing (ScRNA-seq) technology is beneficial to accurately discover the heterogeneity and diversity of cells. Clustering is an extremely important step during the ScRNA-seq data analysis. However, it cannot achieve satisfactory performances by directly clustering ScRNA-seq data due to its high dimensionality and noise. To address these issues, we propose a novel ScRNA-seq data representation model, termed Robust Graph regularized Non-Negative Matrix Factorization with Dissimilarity and Similarity constraints (RGNMF-DS), for ScRNA-seq data clustering. To accurately characterize the structure information of the labeled samples and the unlabeled samples, respectively, the proposed RGNMF-DS model adopts a couple of complementary regularizers (i.e., similarity and dissimilar regularizers) to guide matrix decomposition. In addition, we construct a graph regularizer to discover the local geometric structure hidden in ScRNA-seq data. Moreover, we adopt the -norm to measure the reconstruction error and thereby effectively improve the robustness of the proposed RGNMF-DS model to the noises. Experimental results on several ScRNA-seq datasets have demonstrated that our proposed RGNMF-DS model outperforms other state-of-the-art competitors in clustering.
单细胞RNA测序(ScRNA-seq)技术的显著进展有助于准确发现细胞的异质性和多样性。聚类是ScRNA-seq数据分析过程中极其重要的一步。然而,由于ScRNA-seq数据的高维度和噪声,直接对其进行聚类无法获得令人满意的性能。为了解决这些问题,我们提出了一种新颖的ScRNA-seq数据表示模型,称为具有差异和相似性约束的鲁棒图正则化非负矩阵分解(RGNMF-DS),用于ScRNA-seq数据聚类。为了分别准确地表征标记样本和未标记样本的结构信息,所提出的RGNMF-DS模型采用了一对互补的正则化器(即相似性和差异性正则化器)来指导矩阵分解。此外,我们构建了一个图正则化器来发现隐藏在ScRNA-seq数据中的局部几何结构。而且,我们采用范数来衡量重构误差,从而有效地提高了所提出的RGNMF-DS模型对噪声的鲁棒性。在几个ScRNA-seq数据集上的实验结果表明,我们提出的RGNMF-DS模型在聚类方面优于其他现有的先进竞争对手。