Department of Computer Science, City University of Hong Kong, Hong Kong.
Nucleic Acids Res. 2023 Aug 25;51(15):e81. doi: 10.1093/nar/gkad570.
Single-cell sequencing technology enables the simultaneous capture of multiomic data from multiple cells. The captured data can be represented by tensors, i.e. the higher-rank matrices. However, the existing analysis tools often take the data as a collection of two-order matrices, renouncing the correspondences among the features. Consequently, we propose a probabilistic tensor decomposition framework, SCOIT, to extract embeddings from single-cell multiomic data. SCOIT incorporates various distributions, including Gaussian, Poisson, and negative binomial distributions, to deal with sparse, noisy, and heterogeneous single-cell data. Our framework can decompose a multiomic tensor into a cell embedding matrix, a gene embedding matrix, and an omic embedding matrix, allowing for various downstream analyses. We applied SCOIT to eight single-cell multiomic datasets from different sequencing protocols. With cell embeddings, SCOIT achieves superior performance for cell clustering compared to nine state-of-the-art tools under various metrics, demonstrating its ability to dissect cellular heterogeneity. With the gene embeddings, SCOIT enables cross-omics gene expression analysis and integrative gene regulatory network study. Furthermore, the embeddings allow cross-omics imputation simultaneously, outperforming current imputation methods with the Pearson correlation coefficient increased by 3.38-39.26%; moreover, SCOIT accommodates the scenario that subsets of the cells are with merely one omic profile available.
单细胞测序技术能够同时从多个细胞中捕获多组学数据。捕获的数据可以用张量表示,即更高阶矩阵。然而,现有的分析工具通常将数据视为二维矩阵的集合,放弃了特征之间的对应关系。因此,我们提出了一个概率张量分解框架 SCOIT,用于从单细胞多组学数据中提取嵌入。SCOIT 结合了各种分布,包括高斯分布、泊松分布和负二项式分布,以处理稀疏、嘈杂和异质的单细胞数据。我们的框架可以将一个多组学张量分解为一个细胞嵌入矩阵、一个基因嵌入矩阵和一个组学嵌入矩阵,允许进行各种下游分析。我们将 SCOIT 应用于来自不同测序方案的八个单细胞多组学数据集。使用细胞嵌入,SCOIT 在各种指标下与九种最先进的工具相比,在细胞聚类方面表现出优异的性能,证明了其解析细胞异质性的能力。使用基因嵌入,SCOIT 可以进行跨组学基因表达分析和综合基因调控网络研究。此外,嵌入还允许同时进行跨组学插补,与当前的插补方法相比,Pearson 相关系数提高了 3.38-39.26%;此外,SCOIT 还适应了只有一个组学谱可用的细胞子集的情况。