Charles Perkins Centre, The University of Sydney, Sydney, Australia.
School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.
Nat Commun. 2021 Nov 25;12(1):6911. doi: 10.1038/s41467-021-27130-w.
Single-cell RNA-seq (scRNA-seq) data simulation is critical for evaluating computational methods for analysing scRNA-seq data especially when ground truth is experimentally unattainable. The reliability of evaluation depends on the ability of simulation methods to capture properties of experimental data. However, while many scRNA-seq data simulation methods have been proposed, a systematic evaluation of these methods is lacking. We develop a comprehensive evaluation framework, SimBench, including a kernel density estimation measure to benchmark 12 simulation methods through 35 scRNA-seq experimental datasets. We evaluate the simulation methods on a panel of data properties, ability to maintain biological signals, scalability and applicability. Our benchmark uncovers performance differences among the methods and highlights the varying difficulties in simulating data characteristics. Furthermore, we identify several limitations including maintaining heterogeneity of distribution. These results, together with the framework and datasets made publicly available as R packages, will guide simulation methods selection and their future development.
单细胞 RNA 测序 (scRNA-seq) 数据模拟对于评估分析 scRNA-seq 数据的计算方法至关重要,特别是当实验无法获得真实数据时。评估的可靠性取决于模拟方法捕捉实验数据特性的能力。然而,尽管已经提出了许多 scRNA-seq 数据模拟方法,但缺乏对这些方法的系统评估。我们开发了一个全面的评估框架 SimBench,包括核密度估计度量标准,通过 35 个 scRNA-seq 实验数据集对 12 种模拟方法进行基准测试。我们在一组数据特性、保持生物学信号的能力、可扩展性和适用性方面评估了这些模拟方法。我们的基准测试揭示了方法之间的性能差异,并突出了模拟数据特征的不同难度。此外,我们还发现了一些限制,包括保持分布的异质性。这些结果,以及作为 R 包公开提供的框架和数据集,将指导模拟方法的选择及其未来的发展。