Yuan Hui, Liu Mingzhu, Qiu Yushan, Ching Wai-Ki, Zou Quan
School of Mathematical Sciences, Shenzhen University, Shenzhen, China.
Institute for Advanced Study, Shenzhen University, Shenzhen, China.
PLoS Comput Biol. 2025 Aug 18;21(8):e1013375. doi: 10.1371/journal.pcbi.1013375. eCollection 2025 Aug.
The development of single-cell multi-omics sequencing technologies has enabled the simultaneous analysis of multi-omics data within the same cell. Accurate clustering of these cells is crucial for downstream analyses of complex biological functions. Despite significant advances in multi-omics integration approaches, current methodologies exhibit two major limitations. First, they inadequately incorporate prior biological knowledge from various omic layers. Second, these methods often conduct independent dimensionality reduction on individual omic datasets, thereby failing to capture the intrinsic complementary information and potentially overlooking crucial cross-platform interactions. Motivated by these, this study investigates a non-negative matrix factorization model called PLNMFG, which integrates the unified latent representation learning that retains the features between and within omics and the cluster structure learning that retains the intrinsic structure of the data into one joint framework. Specially, PLNMFG performs adaptive imputation to handle dropout events and uses prior pseudo-labels as constraints during the process of collective non-negative matrix factorization, as a result, a more robust latent representation that preserves the double similarity information is obtained. Graph Laplacian constraint is applied during clustering which further preserves structure characteristic of multi-omics data. In addition, the weight of each omic is adaptively learned based on the omic contribution. A series of experiments on 8 benchmark datasets show that our model performs well in terms of clustering accuracy and computational efficiency.
单细胞多组学测序技术的发展使得在同一细胞内同时分析多组学数据成为可能。对这些细胞进行准确聚类对于复杂生物学功能的下游分析至关重要。尽管多组学整合方法取得了显著进展,但当前方法存在两个主要局限性。首先,它们没有充分整合来自各个组学层面的先验生物学知识。其次,这些方法通常对单个组学数据集进行独立的降维,从而无法捕捉内在的互补信息,并可能忽略关键的跨平台相互作用。受此启发,本研究探讨了一种名为PLNMFG的非负矩阵分解模型,该模型将保留组学之间和组学内部特征的统一潜在表示学习与保留数据内在结构的聚类结构学习整合到一个联合框架中。具体而言,PLNMFG执行自适应插补以处理缺失事件,并在集体非负矩阵分解过程中使用先验伪标签作为约束,结果获得了一个更稳健的潜在表示,该表示保留了双重相似性信息。在聚类过程中应用图拉普拉斯约束,进一步保留了多组学数据的结构特征。此外,基于组学贡献自适应学习每个组学的权重。在8个基准数据集上进行的一系列实验表明,我们的模型在聚类准确性和计算效率方面表现良好。