IEEE J Biomed Health Inform. 2022 Jan;26(1):458-467. doi: 10.1109/JBHI.2021.3091506. Epub 2022 Jan 17.
The development of single-cell RNA sequencing (scRNA-seq) technology has made it possible to measure gene expression levels at the resolution of a single cell, which further reveals the complex growth processes of cells such as mutation and differentiation. Recognizing cell heterogeneity is one of the most critical tasks in scRNA-seq research. To solve it, we propose a non-negative matrix factorization framework based on multi-subspace cell similarity learning for unsupervised scRNA-seq data analysis (MscNMF). MscNMF includes three parts: data decomposition, similarity learning, and similarity fusion. The three work together to complete the data similarity learning task. MscNMF can learn the gene features and cell features of different subspaces, and the correlation and heterogeneity between cells will be more prominent in multi-subspaces. The redundant information and noise in each low-dimensional feature space are eliminated, and its gene weight information can be further analyzed to calculate the optimal number of subpopulations. The final cell similarity learning will be more satisfactory due to the fusion of cell similarity information in different subspaces. The advantage of MscNMF is that it can calculate the number of cell types and the rank of Non-negative matrix factorization (NMF) reasonably. Experiments on eight real scRNA-seq datasets show that MscNMF can effectively perform clustering tasks and extract useful genetic markers. To verify its clustering performance, the framework is compared with other latest clustering algorithms and satisfactory results are obtained. The code of MscNMF is free available for academic (https://github.com/wangchuanyuan1/project-MscNMF).
单细胞 RNA 测序(scRNA-seq)技术的发展使得能够以单细胞的分辨率测量基因表达水平,这进一步揭示了细胞的复杂生长过程,如突变和分化。识别细胞异质性是 scRNA-seq 研究中最关键的任务之一。为了解决这个问题,我们提出了一种基于多子空间细胞相似性学习的非负矩阵分解框架,用于无监督 scRNA-seq 数据分析(MscNMF)。MscNMF 包括三个部分:数据分解、相似性学习和相似性融合。这三个部分共同完成数据相似性学习任务。MscNMF 可以学习不同子空间的基因特征和细胞特征,细胞之间的相关性和异质性在多子空间中更加突出。消除了每个低维特征空间中的冗余信息和噪声,并且可以进一步分析其基因权重信息来计算最佳亚群数量。由于融合了不同子空间中的细胞相似信息,最终的细胞相似性学习将更加令人满意。MscNMF 的优势在于它可以合理地计算细胞类型的数量和非负矩阵分解(NMF)的秩。在八个真实的 scRNA-seq 数据集上的实验表明,MscNMF 可以有效地执行聚类任务并提取有用的遗传标记。为了验证其聚类性能,将该框架与其他最新的聚类算法进行了比较,得到了令人满意的结果。MscNMF 的代码可免费用于学术研究(https://github.com/wangchuanyuan1/project-MscNMF)。
IEEE J Biomed Health Inform. 2022-1
IEEE J Biomed Health Inform. 2023-5
Brief Bioinform. 2022-5-13
IEEE/ACM Trans Comput Biol Bioinform. 2023
G3 (Bethesda). 2021-6-17
IEEE/ACM Trans Comput Biol Bioinform. 2023
IEEE J Biomed Health Inform. 2024-8
Brief Bioinform. 2023-7-20
BMC Bioinformatics. 2024-4-29
Comput Intell Neurosci. 2021