Suppr超能文献

Tamock:宏基因组学中栖息地特异性基准数据的模拟。

Tamock: simulation of habitat-specific benchmark data in metagenomics.

机构信息

Division of Computational System Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria.

Department Bioengineering, University of Applied Sciences FH Campus Wien, Vienna, Austria.

出版信息

BMC Bioinformatics. 2021 May 1;22(1):227. doi: 10.1186/s12859-021-04154-z.

Abstract

BACKGROUND

Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically.

RESULTS

We developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes.

CONCLUSIONS

Tamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study.

AVAILABILITY

Source code, documentation and install instructions are freely available at GitHub ( https://github.com/gerners/tamock ).

摘要

背景

模拟宏基因组读段被广泛用于基准测试宏基因组解释的软件和工作流程。宏基因组基准测试的结果取决于对其潜在生态系统的假设。因此,基准研究的结论仅限于它们所模拟的生态系统。理想情况下,模拟应基于与真实宏基因组群落具有实际相似性的基因组。

结果

我们开发了 Tamock,以根据真实的序列数据,基于特定的宏基因组群落,实现宏基因组读段的真实模拟。Tamock 可以从 NCBI RefSeq 中存在的所有基因组和分类域中创建基准样本。Tamock 可以自动从鸟枪法序列数据中确定分类概况,相应地选择参考基因组,并使用它们来模拟宏基因组读段。我们通过评估所选微生物群落的组装和分类方法性能,展示了 Tamock 的一个示例用例。

结论

Tamock 可以方便地根据真实的序列数据,自动模拟特定栖息地的基准宏基因组数据,并作为用户友好的命令行应用程序实现,为模拟的基准数据提供了广泛的附加信息。生成的基准可以评估特定宏基因组生境或宏基因组研究的生态系统的计算方法、工作流程和参数。

可用性

源代码、文档和安装说明可在 GitHub(https://github.com/gerners/tamock)上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3eac/8088724/006a9d385b9f/12859_2021_4154_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验