Freytag Saskia, Tian Luyi, Lönnstedt Ingrid, Ng Milica, Bahlo Melanie
Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.
Department of Medical Biology, University of Melbourne, Parkville, Australia.
F1000Res. 2018 Aug 15;7:1297. doi: 10.12688/f1000research.15809.2. eCollection 2018.
The commercially available 10x Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method to use. Here we use one gold standard 10x Genomics dataset, generated from the mixture of three cell lines, as well as multiple silver standard 10x Genomics datasets generated from peripheral blood mononuclear cells to examine not only the accuracy but also running time and robustness of a dozen methods. We found that Seurat outperformed other methods, although performance seems to be dependent on many factors, including the complexity of the studied system. Furthermore, we found that solutions produced by different methods have little in common with each other. In light of this we conclude that the choice of clustering tool crucially determines interpretation of scRNA-seq data generated by 10x Genomics. Hence practitioners and consumers should remain vigilant about the outcome of 10x Genomics scRNA-seq analysis.
用于生成基于液滴的单细胞RNA测序(scRNA-seq)数据的市售10x基因组学方案在研究人员中越来越受欢迎。分析此类scRNA-seq数据的基础是将相似或相同的细胞聚类到不重叠的组中的能力。针对此任务已经提出了许多竞争方法,但目前对于使用哪种方法几乎没有指导。在这里,我们使用一个由三种细胞系混合生成的金标准10x基因组学数据集,以及多个由外周血单核细胞生成的银标准10x基因组学数据集,不仅检查了十几种方法的准确性,还检查了它们的运行时间和稳健性。我们发现Seurat的表现优于其他方法,尽管性能似乎取决于许多因素,包括所研究系统的复杂性。此外,我们发现不同方法产生的结果彼此之间几乎没有共同之处。鉴于此,我们得出结论,聚类工具的选择对10x基因组学生成的scRNA-seq数据的解释至关重要。因此,从业者和用户应该对10x基因组学scRNA-seq分析的结果保持警惕。