Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, 36849, USA.
Nat Commun. 2019 Apr 9;10(1):1649. doi: 10.1038/s41467-019-09639-3.
The recently developed droplet-based single-cell transcriptome sequencing (scRNA-seq) technology makes it feasible to perform a population-scale scRNA-seq study, in which the transcriptome is measured for tens of thousands of single cells from multiple individuals. Despite the advances of many clustering methods, there are few tailored methods for population-scale scRNA-seq studies. Here, we develop a Bayesian mixture model for single-cell sequencing (BAMM-SC) method to cluster scRNA-seq data from multiple individuals simultaneously. BAMM-SC takes raw count data as input and accounts for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Results from extensive simulation studies and applications of BAMM-SC to in-house experimental scRNA-seq datasets using blood, lung and skin cells from humans or mice demonstrate that BAMM-SC outperformed existing clustering methods with considerable improved clustering accuracy, particularly in the presence of heterogeneity among individuals.
最近开发的基于液滴的单细胞转录组测序(scRNA-seq)技术使得对成千上万来自多个个体的单细胞进行群体规模的 scRNA-seq 研究成为可能。尽管有许多聚类方法的进步,但针对群体规模的 scRNA-seq 研究几乎没有专门的方法。在这里,我们开发了一种用于单细胞测序的贝叶斯混合模型(BAMM-SC)方法,用于同时对多个个体的 scRNA-seq 数据进行聚类。BAMM-SC 以原始计数数据作为输入,并在统一的贝叶斯层次模型框架中考虑了多个个体之间的数据异质性和批次效应。广泛的模拟研究结果以及 BAMM-SC 在使用来自人类或小鼠的血液、肺和皮肤细胞的内部实验 scRNA-seq 数据集上的应用表明,BAMM-SC 优于现有的聚类方法,具有相当高的聚类准确性,特别是在个体之间存在异质性的情况下。