Zhang Wenyu, Lin Zhixiang
Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China.
Front Genet. 2023 Feb 7;14:998504. doi: 10.3389/fgene.2023.998504. eCollection 2023.
Single-cell multiomics technologies, where the transcriptomic and epigenomic profiles are simultaneously measured in the same set of single cells, pose significant challenges for effective integrative analysis. Here, we propose an unsupervised generative model, iPoLNG, for the effective and scalable integration of single-cell multiomics data. iPoLNG reconstructs low-dimensional representations of the cells and features using computationally efficient stochastic variational inference by modelling the discrete counts in single-cell multiomics data with latent factors. The low-dimensional representation of cells enables the identification of distinct cell types, and the feature by factor loading matrices help characterize cell-type specific markers and provide rich biological insights on the functional pathway enrichment analysis. iPoLNG is also able to handle the setting of partial information where certain modality of the cells is missing. Taking advantage of GPU and probabilistic programming, iPoLNG is scalable to large datasets and it takes less than 15 min to implement on datasets with 20,000 cells.
单细胞多组学技术可在同一组单细胞中同时测量转录组和表观基因组图谱,这给有效的整合分析带来了重大挑战。在此,我们提出了一种无监督生成模型iPoLNG,用于单细胞多组学数据的有效且可扩展的整合。iPoLNG通过用潜在因子对单细胞多组学数据中的离散计数进行建模,使用计算效率高的随机变分推理来重建细胞和特征的低维表示。细胞的低维表示能够识别不同的细胞类型,而通过因子加载矩阵得到的特征有助于表征细胞类型特异性标记,并在功能通路富集分析中提供丰富的生物学见解。iPoLNG还能够处理某些细胞模态缺失的部分信息设置。利用图形处理器(GPU)和概率编程,iPoLNG可扩展到大型数据集,在具有20,000个细胞的数据集上实现只需不到15分钟。