Tamock：宏基因组学中栖息地特异性基准数据的模拟。

Tamock: simulation of habitat-specific benchmark data in metagenomics.

机构信息

Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria.

Department Bioengineering, University of Applied Sciences FH Campus Wien, Vienna, Austria.

出版信息

BMC Bioinformatics. 2021 May 1;22(1):227. doi: 10.1186/s12859-021-04154-z.

DOI:10.1186/s12859-021-04154-z

PMID:33932979

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8088724/

Abstract

BACKGROUND

Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically.

RESULTS

We developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes.

CONCLUSIONS

Tamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study.

AVAILABILITY

Source code, documentation and install instructions are freely available at GitHub ( https://github.com/gerners/tamock ).

摘要

背景

模拟宏基因组读段被广泛用于基准测试宏基因组解释的软件和工作流程。宏基因组基准测试的结果取决于对其潜在生态系统的假设。因此，基准研究的结论仅限于它们所模拟的生态系统。理想情况下，模拟应基于与真实宏基因组群落具有实际相似性的基因组。

结果

我们开发了 Tamock，以根据真实的序列数据，基于特定的宏基因组群落，实现宏基因组读段的真实模拟。Tamock 可以从 NCBI RefSeq 中存在的所有基因组和分类域中创建基准样本。Tamock 可以自动从鸟枪法序列数据中确定分类概况，相应地选择参考基因组，并使用它们来模拟宏基因组读段。我们通过评估所选微生物群落的组装和分类方法性能，展示了 Tamock 的一个示例用例。