Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD United States of America.
Science, Math and Computer Science Magnet Program, Poolesville High School, Poolesville, MD United States of America.
PLoS Comput Biol. 2019 Apr 19;14(4):e1006935. doi: 10.1371/journal.pcbi.1006935. eCollection 2018 Jun.
Bioinformatics techniques to analyze time course bulk and single cell omics data are advancing. The absence of a known ground truth of the dynamics of molecular changes challenges benchmarking their performance on real data. Realistic simulated time-course datasets are essential to assess the performance of time course bioinformatics algorithms. We develop an R/Bioconductor package, CancerInSilico, to simulate bulk and single cell transcriptional data from a known ground truth obtained from mathematical models of cellular systems. This package contains a general R infrastructure for running cell-based models and simulating gene expression data based on the model states. We show how to use this package to simulate a gene expression data set and consequently benchmark analysis methods on this data set with a known ground truth. The package is freely available via Bioconductor: http://bioconductor.org/packages/CancerInSilico/.
生物信息学技术可用于分析时间过程的批量和单细胞组学数据,这些技术正在不断发展。由于缺乏分子变化动态的已知真实情况,因此在真实数据上对其性能进行基准测试具有挑战性。逼真的模拟时间过程数据集对于评估时间过程生物信息学算法的性能至关重要。我们开发了一个 R / Bioconductor 包 CancerInSilico ,可从细胞系统的数学模型获得的已知真实情况中模拟批量和单细胞转录组数据。该程序包包含一个用于运行基于细胞的模型和基于模型状态模拟基因表达数据的通用 R 基础结构。我们展示了如何使用此程序包模拟基因表达数据集,并随后使用具有已知真实情况的数据集对分析方法进行基准测试。该程序包可通过 Bioconductor 免费获得:http://bioconductor.org/packages/CancerInSilico/。