University of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai, 200240, China.
Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, 15260, USA.
Genome Biol. 2020 Jul 30;21(1):188. doi: 10.1186/s13059-020-02084-2.
Identifying and removing multiplets are essential to improving the scalability and the reliability of single cell RNA sequencing (scRNA-seq). Multiplets create artificial cell types in the dataset. We propose a Gaussian mixture model-based multiplet identification method, GMM-Demux. GMM-Demux accurately identifies and removes multiplets through sample barcoding, including cell hashing and MULTI-seq. GMM-Demux uses a droplet formation model to authenticate putative cell types discovered from a scRNA-seq dataset. We generate two in-house cell-hashing datasets and compared GMM-Demux against three state-of-the-art sample barcoding classifiers. We show that GMM-Demux is stable and highly accurate and recognizes 9 multiplet-induced fake cell types in a PBMC dataset.
识别和去除多聚体对于提高单细胞 RNA 测序 (scRNA-seq) 的可扩展性和可靠性至关重要。多聚体在数据集中创建人工细胞类型。我们提出了一种基于高斯混合模型的多聚体识别方法,GMM-Demux。GMM-Demux 通过样本条形码识别和去除多聚体,包括细胞哈希和 MULTI-seq。GMM-Demux 使用液滴形成模型来验证从 scRNA-seq 数据集中发现的潜在细胞类型。我们生成了两个内部细胞哈希数据集,并将 GMM-Demux 与三种最先进的样本条形码分类器进行了比较。我们表明,GMM-Demux 是稳定的,高度准确的,并在 PBMC 数据集中识别出 9 个多聚体诱导的假细胞类型。