Department of Molecular Life Sciences, University of Zürich, Winterthurerstrasse 190, 8057, Zürich, Switzerland.
SIB Swiss Institute of Bioinformatics, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland.
Genome Biol. 2023 Feb 10;24(1):23. doi: 10.1186/s13059-023-02859-3.
Quality control (QC) is a critical component of single-cell RNA-seq (scRNA-seq) processing pipelines. Current approaches to QC implicitly assume that datasets are comprised of one cell type, potentially resulting in biased exclusion of rare cell types. We introduce SampleQC, which robustly fits a Gaussian mixture model across multiple samples, improves sensitivity, and reduces bias compared to current approaches. We show via simulations that SampleQC is less susceptible to exclusion of rarer cell types. We also demonstrate SampleQC on a complex real dataset (867k cells over 172 samples). SampleQC is general, is implemented in R, and could be applied to other data types.
质量控制 (QC) 是单细胞 RNA-seq (scRNA-seq) 处理管道的关键组成部分。当前的 QC 方法隐含地假设数据集仅由一种细胞类型组成,这可能导致稀有细胞类型被有偏差地排除。我们引入了 SampleQC,它可以在多个样本中稳健地拟合高斯混合模型,与当前的方法相比,提高了灵敏度并减少了偏差。我们通过模拟表明,SampleQC 较少受到排除更稀有细胞类型的影响。我们还在一个复杂的真实数据集(172 个样本中的 867k 个细胞)上演示了 SampleQC。SampleQC 具有通用性,在 R 中实现,并且可以应用于其他数据类型。