使用 AMDirT 促进古代宏基因组数据的可访问、快速和适当处理。

Facilitating accessible, rapid, and appropriate processing of ancient metagenomic data with AMDirT.

机构信息

Cluster of Excellence "Balance of the Microverse", Leibniz Institute for Natural Product Research and Infection Biology Hans Knöll Institute, Adolf-Reichwein-Straße 23, Jena, Thuringia, 07745, Germany.

Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Pl. 6, Leipzig, Saxony, 04103, Germany.

出版信息

F1000Res. 2024 May 28;12:926. doi: 10.12688/f1000research.134798.2. eCollection 2023.

DOI:10.12688/f1000research.134798.2

PMID:39262445

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11387932/

Abstract

BACKGROUND

Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.org) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications.

METHODS

Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate rapid data filtering and downloading of ancient metagenomic data, as well as improving automated metadata curation and validation for AncientMetagenomeDir.

RESULTS

AncientMetagenomeDir was extended to include standardised metadata of over 6000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community.

CONCLUSIONS

Together, both standardised metadata reporting and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.

摘要

背景

在选择公共宏基因组测序数据集以供新的生物分析重复使用时，访问样本级别的元数据非常重要。标准、预防措施和古代宏基因组学社区（SPAAM，https://spaam-community.org）之前发布了 AncientMetagenomeDir，这是一个经过精心整理和标准化的样本元数据表格的集合，用于从古代样本中生成的宏基因组和微生物基因组数据集。然而，虽然样本级别的信息对于识别相关样本以纳入新项目非常有用，但下一代测序 (NGS) 文库构建和测序元数据对于适当处理古代宏基因组数据也至关重要。目前，当实验室和生物信息学元数据以基于散文的出版物中的异质方式记录时，恢复用于下载和准备此类数据的信息非常困难。

方法

通过一系列基于社区的黑客马拉松活动，AncientMetagenomeDir 进行了更新，以提供现有和新的古代宏基因组样本的标准化库级元数据。与此同时，开发了配套工具 'AMDirT'，以方便快速筛选和下载古代宏基因组数据，以及改进 AncientMetagenomeDir 的自动化元数据整理和验证。