Du Yuheng, Huang Qianhui, Arisdakessian Cedric, Garmire Lana X
Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48105.
University of Hawaii at Manoa, Department of Information and Computer Science, Honolulu, HI, 96816.
G3 (Bethesda). 2020 May 4;10(5):1775-1783. doi: 10.1534/g3.120.401160.
Alignment of scRNA-Seq data are the first and one of the most critical steps of the scRNA-Seq analysis workflow, and thus the choice of proper aligners is of paramount importance. Recently, STAR an alignment method and Kallisto a pseudoalignment method have both gained a vast amount of popularity in the single cell sequencing field. However, an unbiased third-party comparison of these two methods in scRNA-Seq is lacking. Here we conduct a systematic comparison of them on a variety of Drop-seq, Fluidigm and 10x genomics data, from the aspects of gene abundance, alignment accuracy, as well as computational speed and memory use. We observe that STAR globally produces more genes and higher gene-expression values, compared to Kallisto, as well as Bowtie2, another popular alignment method for bulk RNA-Seq. STAR also yields higher correlations of the Gini index for the genes with RNA-FISH validation results. Using 10x genomics PBMC 3K scRNA-Seq and mouse cortex single nuclei RNA-Seq data, STAR shows similar or better cell-type annotation results, by detecting a larger subset of known gene markers. However, the gain of accuracy and gene abundance of STAR alignment comes with the price of significantly slower computation time (4 folds) and more memory (7.7 folds), compared to Kallisto.
单细胞RNA测序(scRNA-Seq)数据的比对是scRNA-Seq分析流程的首要且关键步骤之一,因此选择合适的比对工具至关重要。最近,STAR(一种比对方法)和Kallisto(一种伪比对方法)在单细胞测序领域都颇受欢迎。然而,在scRNA-Seq中缺乏对这两种方法的公正第三方比较。在此,我们基于多种Drop-seq、Fluidigm和10x基因组学数据,从基因丰度、比对准确性以及计算速度和内存使用等方面对它们进行了系统比较。我们观察到,与Kallisto以及另一种用于批量RNA-Seq的常用比对方法Bowtie2相比,STAR总体上能产生更多基因和更高的基因表达值。对于具有RNA荧光原位杂交(RNA-FISH)验证结果的基因,STAR的基尼指数相关性也更高。使用10x基因组学PBMC 3K scRNA-Seq和小鼠皮质单细胞核RNA-Seq数据,通过检测更大的已知基因标记子集,STAR显示出相似或更好的细胞类型注释结果。然而,与Kallisto相比,STAR比对在准确性和基因丰度方面的提升是以显著更长的计算时间(4倍)和更多的内存(7.7倍)为代价的。