Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria.
Department Bioengineering, University of Applied Sciences FH Campus Wien, Vienna, Austria.
BMC Bioinformatics. 2021 May 1;22(1):227. doi: 10.1186/s12859-021-04154-z.
Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically.
We developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes.
Tamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study.
Source code, documentation and install instructions are freely available at GitHub ( https://github.com/gerners/tamock ).
模拟宏基因组读段被广泛用于基准测试宏基因组解释的软件和工作流程。宏基因组基准测试的结果取决于对其潜在生态系统的假设。因此,基准研究的结论仅限于它们所模拟的生态系统。理想情况下,模拟应基于与真实宏基因组群落具有实际相似性的基因组。
我们开发了 Tamock,以根据真实的序列数据,基于特定的宏基因组群落,实现宏基因组读段的真实模拟。Tamock 可以从 NCBI RefSeq 中存在的所有基因组和分类域中创建基准样本。Tamock 可以自动从鸟枪法序列数据中确定分类概况,相应地选择参考基因组,并使用它们来模拟宏基因组读段。我们通过评估所选微生物群落的组装和分类方法性能,展示了 Tamock 的一个示例用例。
Tamock 可以方便地根据真实的序列数据,自动模拟特定栖息地的基准宏基因组数据,并作为用户友好的命令行应用程序实现,为模拟的基准数据提供了广泛的附加信息。生成的基准可以评估特定宏基因组生境或宏基因组研究的生态系统的计算方法、工作流程和参数。
源代码、文档和安装说明可在 GitHub(https://github.com/gerners/tamock)上免费获得。