Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), 8A Biomedical Grove, Immunos Building, Level 3, Singapore, 138648, Singapore.
Genome Biol. 2020 Jan 16;21(1):12. doi: 10.1186/s13059-019-1850-9.
Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal.
We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression.
Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.
使用不同技术生成的大规模单细胞转录组数据集包含批次特异性的系统变化,这对批次效应去除和数据集成提出了挑战。随着 scRNA-seq 数据的持续增长,利用可用的计算资源实现有效的批量整合至关重要。在这里,我们对现有的批量校正方法进行了深入的基准研究,以确定最适合去除批次效应的方法。
我们从计算运行时间、处理大数据集的能力以及在保持细胞类型纯度的同时去除批次效应的效果等方面比较了 14 种方法。研究设计了五种情况:不同技术的相同细胞类型、不同细胞类型、多个批次、大数据和模拟数据。使用 kBET、LISI、ASW 和 ARI 等四种基准测试指标来评估性能。我们还研究了使用经过批量校正的数据来研究差异基因表达。
根据我们的结果,Harmony、LIGER 和 Seurat 3 是推荐用于批量整合的方法。由于 Harmony 的运行时间明显更短,因此建议将其作为首选方法,其他方法则作为可行的替代方法。