ARGprofiler-一个用于大规模分析宏基因组数据集中抗菌药物耐药基因及其侧翼区域的管道。

ARGprofiler-a pipeline for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets.

机构信息

Research Group for Genomic Epidemiology, Technical University of Denmark, Henrik Danms Allé, Bygning 204, Kongens Lyngby 2800, Denmark.

出版信息

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae086.

DOI:10.1093/bioinformatics/btae086

PMID:38377397

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10918635/

Abstract

MOTIVATION

Analyzing metagenomic data can be highly valuable for understanding the function and distribution of antimicrobial resistance genes (ARGs). However, there is a need for standardized and reproducible workflows to ensure the comparability of studies, as the current options involve various tools and reference databases, each designed with a specific purpose in mind.

RESULTS

In this work, we have created the workflow ARGprofiler to process large amounts of raw sequencing reads for studying the composition, distribution, and function of ARGs. ARGprofiler tackles the challenge of deciding which reference database to use by providing the PanRes database of 14 078 unique ARGs that combines several existing collections into one. Our pipeline is designed to not only produce abundance tables of genes and microbes but also to reconstruct the flanking regions of ARGs with ARGextender. ARGextender is a bioinformatic approach combining KMA and SPAdes to recruit reads for a targeted de novo assembly. While our aim is on ARGs, the pipeline also creates Mash sketches for fast searching and comparisons of sequencing runs.

AVAILABILITY AND IMPLEMENTATION

The ARGprofiler pipeline is a Snakemake workflow that supports the reuse of metagenomic sequencing data and is easily installable and maintained at https://github.com/genomicepidemiology/ARGprofiler.

摘要

动机

分析宏基因组数据对于理解抗生素耐药基因（ARGs）的功能和分布非常有价值。然而，需要标准化和可重复的工作流程来确保研究的可比性，因为目前的选择涉及各种工具和参考数据库，每个数据库都是针对特定目的设计的。

结果

在这项工作中，我们创建了工作流程 ARGprofiler，用于处理大量原始测序reads，以研究 ARGs 的组成、分布和功能。ARGprofiler 通过提供 PanRes 数据库来解决选择使用哪个参考数据库的问题，该数据库包含了 14078 个独特的 ARGs，将几个现有的集合组合成一个。我们的流水线不仅可以生成基因和微生物的丰度表，还可以使用 ARGextender 重建 ARGs 的侧翼区域。ARGextender 是一种生物信息学方法，结合了 KMA 和 SPAdes 来招募用于靶向从头组装的reads。虽然我们的目标是 ARGs，但该流水线还为快速搜索和比较测序运行创建了 Mash 草图。