Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.
Nucleic Acids Res. 2021 Apr 19;49(7):e42. doi: 10.1093/nar/gkab004.
As the cost of single-cell RNA-seq experiments has decreased, an increasing number of datasets are now available. Combining newly generated and publicly accessible datasets is challenging due to non-biological signals, commonly known as batch effects. Although there are several computational methods available that can remove batch effects, evaluating which method performs best is not straightforward. Here, we present BatchBench (https://github.com/cellgeni/batchbench), a modular and flexible pipeline for comparing batch correction methods for single-cell RNA-seq data. We apply BatchBench to eight methods, highlighting their methodological differences and assess their performance and computational requirements through a compendium of well-studied datasets. This systematic comparison guides users in the choice of batch correction tool, and the pipeline makes it easy to evaluate other datasets.
随着单细胞 RNA-seq 实验成本的降低,现在越来越多的数据集可用。由于非生物学信号,通常称为批次效应,因此将新生成和公开可访问的数据集组合在一起具有挑战性。尽管有几种可用的计算方法可以去除批次效应,但评估哪种方法效果最好并不简单。在这里,我们介绍了 BatchBench(https://github.com/cellgeni/batchbench),这是一个用于比较单细胞 RNA-seq 数据批次校正方法的模块化和灵活的管道。我们将 BatchBench 应用于八种方法,突出显示它们的方法差异,并通过一系列研究充分的数据集评估它们的性能和计算要求。这种系统比较指导用户选择批次校正工具,并且该管道使得评估其他数据集变得容易。