Morais Diego A A, Cavalcante João V F, Monteiro Shênia S, Pasquali Matheus A B, Dalmolin Rodrigo J S
Bioinformatics Multidisciplinary Environment, Federal University of Rio Grande do Norte, Natal, Brazil.
Graduate Program in Engineering and Natural Resources Management, Federal University of Campina Grande, Campina Grande, Brazil.
Front Genet. 2022 Mar 7;13:814437. doi: 10.3389/fgene.2022.814437. eCollection 2022.
Metagenomic studies unravel details about the taxonomic composition and the functions performed by microbial communities. As a complete metagenomic analysis requires different tools for different purposes, the selection and setup of these tools remain challenging. Furthermore, the chosen toolset will affect the accuracy, the formatting, and the functional identifiers reported in the results, impacting the results interpretation and the biological answer obtained. Thus, we surveyed state-of-the-art tools available in the literature, created simulated datasets, and performed benchmarks to design a sensitive and flexible metagenomic analysis pipeline. Here we present MEDUSA, an efficient pipeline to conduct comprehensive metagenomic analyses. It performs preprocessing, assembly, alignment, taxonomic classification, and functional annotation on shotgun data, supporting user-built dictionaries to transfer annotations to any functional identifier. MEDUSA includes several tools, as fastp, Bowtie2, DIAMOND, Kaiju, MEGAHIT, and a novel tool implemented in Python to transfer annotations to BLAST/DIAMOND alignment results. These tools are installed via Conda, and the workflow is managed by Snakemake, easing the setup and execution. Compared with MEGAN 6 Community Edition, MEDUSA correctly identifies more species, especially the less abundant, and is more suited for functional analysis using Gene Ontology identifiers.
宏基因组学研究揭示了微生物群落的分类组成和所执行功能的细节。由于完整的宏基因组分析针对不同目的需要不同工具,因此这些工具的选择和设置仍然具有挑战性。此外,所选的工具集将影响结果中报告的准确性、格式和功能标识符,从而影响结果解释和获得的生物学答案。因此,我们调查了文献中可用的最新工具,创建了模拟数据集,并进行了基准测试,以设计一个灵敏且灵活的宏基因组分析流程。在此,我们展示了MEDUSA,这是一个用于进行全面宏基因组分析的高效流程。它对鸟枪法数据执行预处理、组装、比对、分类学分类和功能注释,支持用户构建的字典将注释转换为任何功能标识符。MEDUSA包括多个工具,如fastp、Bowtie2、DIAMOND、Kaiju、MEGAHIT,以及一个用Python实现的将注释转换为BLAST/DIAMOND比对结果的新工具。这些工具通过Conda安装,工作流程由Snakemake管理,简化了设置和执行过程。与MEGAN 6社区版相比,MEDUSA能正确识别更多物种,尤其是丰度较低的物种,并且更适合使用基因本体标识符进行功能分析。