Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA.
Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
Sci Data. 2024 Jan 17;11(1):81. doi: 10.1038/s41597-023-02877-7.
Shotgun metagenomic sequencing comprehensively samples the DNA of a microbial sample. Choosing the best bioinformatics processing package can be daunting due to the wide variety of tools available. Here, we assessed publicly available shotgun metagenomics processing packages/pipelines including bioBakery, Just a Microbiology System (JAMS), Whole metaGenome Sequence Assembly V2 (WGSA2), and Woltka using 19 publicly available mock community samples and a set of five constructed pathogenic gut microbiome samples. Also included is a workflow for labelling bacterial scientific names with NCBI taxonomy identifiers for better resolution in assessing results. The Aitchison distance, a sensitivity metric, and total False Positive Relative Abundance were used for accuracy assessments for all pipelines and mock samples. Overall, bioBakery4 performed the best with most of the accuracy metrics, while JAMS and WGSA2, had the highest sensitivities. Furthermore, bioBakery is commonly used and only requires a basic knowledge of command line usage. This work provides an unbiased assessment of shotgun metagenomics packages and presents results assessing the performance of the packages using mock community sequence data.
shotgun 宏基因组测序全面地对微生物样本的 DNA 进行采样。由于有各种各样的工具可供选择,因此选择最佳的生物信息学处理包可能会让人望而却步。在这里,我们使用 19 个公开的模拟群落样本和一组五个构建的致病性肠道微生物组样本,评估了包括 bioBakery、Just a Microbiology System (JAMS)、Whole metaGenome Sequence Assembly V2 (WGSA2) 和 Woltka 在内的公开可用的 shotgun 宏基因组学处理包/管道。还包括一个工作流程,用于用 NCBI 分类标识符标记细菌的科学名称,以便在评估结果时更好地解决分辨率问题。Aitchison 距离,一种灵敏度指标,以及总假阳性相对丰度,用于所有管道和模拟样本的准确性评估。总体而言,bioBakery4 在大多数准确性指标上表现最好,而 JAMS 和 WGSA2 的灵敏度最高。此外,bioBakery 通常被使用,并且只需要对命令行使用有基本的了解。这项工作对 shotgun 宏基因组学包进行了无偏见的评估,并使用模拟群落序列数据评估了包的性能。