Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC 3052, Australia.
Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia.
Bioinformatics. 2020 Apr 1;36(7):2288-2290. doi: 10.1093/bioinformatics/btz889.
Bioinformatic analysis of single-cell gene expression data is a rapidly evolving field. Hundreds of bespoke methods have been developed in the past few years to deal with various aspects of single-cell analysis and consensus on the most appropriate methods to use under different settings is still emerging. Benchmarking the many methods is therefore of critical importance and since analysis of single-cell data usually involves multi-step pipelines, effective evaluation of pipelines involving different combinations of methods is required. Current benchmarks of single-cell methods are mostly implemented with ad-hoc code that is often difficult to reproduce or extend, and exhaustive manual coding of many combinations is infeasible in most instances. Therefore, new software is needed to manage pipeline benchmarking.
The CellBench R software facilitates method comparisons in either a task-centric or combinatorial way to allow pipelines of methods to be evaluated in an effective manner. CellBench automatically runs combinations of methods, provides facilities for measuring running time and delivers output in tabular form which is highly compatible with tidyverse R packages for summary and visualization. Our software has enabled comprehensive benchmarking of single-cell RNA-seq normalization, imputation, clustering, trajectory analysis and data integration methods using various performance metrics obtained from data with available ground truth. CellBench is also amenable to benchmarking other bioinformatics analysis tasks.
Available from https://bioconductor.org/packages/CellBench.
单细胞基因表达数据的生物信息学分析是一个快速发展的领域。在过去的几年中,已经开发了数百种定制方法来处理单细胞分析的各个方面,在不同的环境下使用最合适的方法的共识仍在出现。因此,基准测试许多方法至关重要,由于单细胞数据的分析通常涉及多步骤的流水线,因此需要对涉及不同方法组合的流水线进行有效的评估。目前的单细胞方法基准测试大多使用专门编写的代码实现,这些代码通常难以重现或扩展,并且在大多数情况下,无法手动编写许多组合的代码。因此,需要新的软件来管理流水线基准测试。
CellBench R 软件以任务为中心或以组合的方式促进方法比较,以有效地评估方法的流水线。CellBench 自动运行方法组合,提供测量运行时间的设施,并以表格形式提供输出,与 tidyverse R 包高度兼容,可用于汇总和可视化。我们的软件已经使用具有可用真实数据的各种性能指标,全面基准测试了单细胞 RNA-seq 标准化、插补、聚类、轨迹分析和数据集成方法。CellBench 也适用于基准测试其他生物信息学分析任务。