Suppr超能文献

SEQ2MGS:一种从现有测序数据生成逼真的人工宏基因组的有效工具。

SEQ2MGS: an effective tool for generating realistic artificial metagenomes from the existing sequencing data.

作者信息

Van Camp Pieter-Jan, Porollo Aleksey

机构信息

Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH, USA.

Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.

出版信息

NAR Genom Bioinform. 2022 Jul 25;4(3):lqac050. doi: 10.1093/nargab/lqac050. eCollection 2022 Sep.

Abstract

Assessment of bioinformatics tools for the metagenomics analysis from the whole genome sequencing data requires realistic benchmark sets. We developed an effective and simple generator of artificial metagenomes from real sequencing experiments. The tool (SEQ2MGS) analyzes the input FASTQ files, precomputes genomic content, and blends shotgun reads from different sequenced isolates, or spike isolate(s) in real metagenome, in desired proportions. SEQ2MGS eliminates the need for simulation of sequencing platform variations, reads distributions, presence of plasmids, viruses, and contamination. The tool is especially useful for a quick generation of multiple complex samples that include new or understudied organisms, even without assembled genomes. For illustration, we first demonstrated the ease of SEQ2MGS use for the simulation of altered Schaedler flora (ASF) in comparison with metagenomics generators Grinder and CAMISIM. Next, we emulated the emergence of a pathogen in the human gut microbiome and observed that Kraken, Centrifuge, and MetaPhlAn, while correctly identified , produced inconsistent results for the rest of real metagenome. Finally, using the MG-RAST platform, we affirmed that SEQ2MGS properly transfers genomic information from an isolate into the simulated metagenome by the correct identification of antimicrobial resistance genes anticipated to appear compared to the original metagenome.

摘要

从全基因组测序数据进行宏基因组学分析的生物信息学工具评估需要现实的基准数据集。我们从真实测序实验中开发了一种有效且简单的人工宏基因组生成器。该工具(SEQ2MGS)分析输入的FASTQ文件,预先计算基因组内容,并按所需比例混合来自不同测序分离株或真实宏基因组中的加标分离株的鸟枪法 reads。SEQ2MGS无需模拟测序平台变异、reads分布、质粒、病毒和污染情况。该工具对于快速生成多个复杂样本特别有用,这些样本包括新的或研究不足的生物体,即使没有组装好的基因组。为了说明这一点,我们首先展示了与宏基因组学生成器Grinder和CAMISIM相比,SEQ2MGS用于模拟改变的舍德勒菌群(ASF)的简便性。接下来,我们模拟了人类肠道微生物群中病原体的出现,并观察到Kraken、Centrifuge和MetaPhlAn虽然能正确识别,但对其余真实宏基因组产生了不一致的结果。最后,使用MG-RAST平台,我们确认SEQ2MGS通过与原始宏基因组相比正确识别预期出现的抗菌抗性基因,将基因组信息从分离株正确转移到模拟宏基因组中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4390/9310082/e5ab3c3c5f68/lqac050fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验