School of Mathematical Sciences, Shenzhen University, 518000, Guangdong, China.
College of Life and Health Sciences, Northeastern University, Shenyang, 110169, China.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae228.
The technology for analyzing single-cell multi-omics data has advanced rapidly and has provided comprehensive and accurate cellular information by exploring cell heterogeneity in genomics, transcriptomics, epigenomics, metabolomics and proteomics data. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, as well as the limitations of various analysis algorithms, the clustering performance is generally poor. Matrix factorization is an unsupervised, dimensionality reduction-based method that can cluster individuals and discover related omics variables from different blocks. Here, we present a novel algorithm that performs joint dimensionality reduction learning and cell clustering analysis on single-cell multi-omics data using non-negative matrix factorization that we named scMNMF. We formulate the objective function of joint learning as a constrained optimization problem and derive the corresponding iterative formulas through alternating iterative algorithms. The major advantage of the scMNMF algorithm remains its capability to explore hidden related features among omics data. Additionally, the feature selection for dimensionality reduction and cell clustering mutually influence each other iteratively, leading to a more effective discovery of cell types. We validated the performance of the scMNMF algorithm using two simulated and five real datasets. The results show that scMNMF outperformed seven other state-of-the-art algorithms in various measurements.
scMNMF code can be found at https://github.com/yushanqiu/scMNMF.
分析单细胞多组学数据的技术发展迅速,通过探索基因组学、转录组学、表观基因组学、代谢组学和蛋白质组学数据中的细胞异质性,提供了全面而准确的细胞信息。然而,由于单细胞多组学数据具有高维性和稀疏性的特点,以及各种分析算法的局限性,聚类性能通常较差。矩阵分解是一种无监督的、基于降维的方法,可以从不同的块中对个体进行聚类,并发现相关的组学变量。在这里,我们提出了一种新的算法,该算法使用非负矩阵分解对单细胞多组学数据进行联合降维和细胞聚类分析,我们将其命名为 scMNMF。我们将联合学习的目标函数表述为一个约束优化问题,并通过交替迭代算法推导出相应的迭代公式。scMNMF 算法的主要优点是能够在组学数据中探索隐藏的相关特征。此外,降维和细胞聚类的特征选择相互迭代影响,从而更有效地发现细胞类型。我们使用两个模拟数据集和五个真实数据集验证了 scMNMF 算法的性能。结果表明,在各种测量中,scMNMF 算法均优于其他七种最先进的算法。
scMNMF 代码可在 https://github.com/yushanqiu/scMNMF 上找到。