Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, Australia.
Genome Biol. 2021 Dec 14;22(1):339. doi: 10.1186/s13059-021-02552-3.
Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied.
Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis.
In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
单细胞 RNA 测序 (scRNA-seq) 技术及其相关分析方法近年来发展迅速。这包括预处理方法,即将测序读取分配给基因,以创建用于下游分析的计数矩阵。虽然已经开发了几个封装的预处理工作流程,为用户提供处理此过程的方便工具,但它们彼此之间的比较以及它们如何影响下游分析尚未得到很好的研究。
在这里,我们使用 CEL-Seq2 和 10x Chromium 平台生成的不同生物学复杂性水平的数据集,系统地评估了 10 个端到端预处理工作流程(Cell Ranger、Optimus、salmon alevin、alevin-fry、kallisto bustools、dropSeqPipe、scPipe、zUMIs、celseq2 和 scruff)的性能。我们直接比较这些工作流程的定量特性,以及通过评估不同方法组合的性能来评估它们对归一化和聚类的影响。虽然比较的 scRNA-seq 预处理工作流程在跨数据集的基因检测和定量方面存在差异,但在使用表现良好的归一化和聚类方法进行下游分析后,几乎所有组合的聚类结果都与我们分析中提供真实标签的已知细胞类型标签非常吻合。
总之,预处理方法的选择被发现不如 scRNA-seq 分析过程中的其他步骤重要。我们的研究全面比较了常见的 scRNA-seq 预处理工作流程,并总结了它们的特点,以指导工作流程用户。