Wang Zhe, Yang Shiyi, Koga Yusuke, Corbett Sean E, Shea Conor V, Johnson W Evan, Yajima Masanao, Campbell Joshua D
Bioinformatics Program, Boston University, Boston, MA, USA.
Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA.
NAR Genom Bioinform. 2022 Sep 13;4(3):lqac066. doi: 10.1093/nargab/lqac066. eCollection 2022 Sep.
Single-cell RNA-seq (scRNA-seq) has emerged as a powerful technique to quantify gene expression in individual cells and to elucidate the molecular and cellular building blocks of complex tissues. We developed a novel Bayesian hierarchical model called Cellular Latent Dirichlet Allocation (Celda) to perform co-clustering of genes into transcriptional modules and cells into subpopulations. Celda can quantify the probabilistic contribution of each gene to each module, each module to each cell population and each cell population to each sample. In a peripheral blood mononuclear cell dataset, Celda identified a subpopulation of proliferating T cells and a plasma cell which were missed by two other common single-cell workflows. Celda also identified transcriptional modules that could be used to characterize unique and shared biological programs across cell types. Finally, Celda outperformed other approaches for clustering genes into modules on simulated data. Celda presents a novel method for characterizing transcriptional programs and cellular heterogeneity in scRNA-seq data.
单细胞RNA测序(scRNA-seq)已成为一种强大的技术,可用于量化单个细胞中的基因表达,并阐明复杂组织的分子和细胞组成部分。我们开发了一种名为细胞潜在狄利克雷分配(Celda)的新型贝叶斯分层模型,用于将基因共聚类到转录模块中,并将细胞共聚类到亚群中。Celda可以量化每个基因对每个模块、每个模块对每个细胞群体以及每个细胞群体对每个样本的概率贡献。在一个外周血单核细胞数据集中,Celda识别出了另外两种常见的单细胞工作流程遗漏的增殖性T细胞亚群和浆细胞。Celda还识别出了可用于表征不同细胞类型中独特和共享生物学程序的转录模块。最后,在模拟数据上,Celda在将基因聚类到模块方面优于其他方法。Celda提出了一种在scRNA-seq数据中表征转录程序和细胞异质性的新方法。