subMG可自动完成宏基因组学研究的数据提交工作。

subMG automates data submission for metagenomics studies.

作者信息

Tubbesing Tom, Schlüter Andreas, Sczyrba Alexander

机构信息

Computational Metagenomics Group, Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstraße 27, 33615, Bielefeld, Germany.

IBG-5: Computational Metagenomics, Institute of Bio- and Geosciences (IBG), Forschungszentrum Jülich GmbH, c/o Centrum für Biotechnologie (CeBiTec), 33594, Bielefeld, Germany.

出版信息

BioData Min. 2025 Jun 5;18(1):38. doi: 10.1186/s13040-025-00453-w.

DOI:10.1186/s13040-025-00453-w

PMID:40474206

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12142852/

Abstract

BACKGROUND

Publicly available metagenomics datasets are crucial for ensuring the reproducibility of scientific findings and supporting contemporary large-scale studies. However, submitting a comprehensive metagenomics dataset is both cumbersome and time-consuming. It requires including sample information, sequencing reads, assemblies, binned contigs, metagenome-assembled genomes (MAGs), and appropriate metadata. As a result, metagenomics studies are often published with incomplete datasets or, in some cases, without any data at all. subMG addresses this challenge by simplifying and automating the data submission process, thereby encouraging broader and more consistent data sharing.

RESULTS

subMG streamlines the process of submitting metagenomics study results to the European Nucleotide Archive (ENA) by allowing researchers to input files and metadata from their studies in a single form and automating downstream tasks that otherwise require extensive manual effort and expertise. The tool comes with comprehensive documentation as well as example data tailored for different use cases and can be operated via the command-line or a graphical user interface (GUI), making it easily deployable to a wide range of potential users.

CONCLUSIONS

By simplifying the submission of genome-resolved metagenomics study datasets, subMG significantly reduces the time, effort, and expertise required from researchers, thus paving the way for more numerous and comprehensive data submissions in the future. An increased availability of well-documented and FAIR data can benefit future research, particularly in meta-analyses and comparative studies.

摘要

背景

公开可用的宏基因组学数据集对于确保科学发现的可重复性和支持当代大规模研究至关重要。然而，提交一个全面的宏基因组学数据集既繁琐又耗时。这需要包含样本信息、测序读数、组装结果、分箱重叠群、宏基因组组装基因组（MAG）以及适当的元数据。因此，宏基因组学研究往往在数据集不完整的情况下发表，或者在某些情况下根本没有任何数据。subMG通过简化和自动化数据提交过程来应对这一挑战，从而鼓励更广泛、更一致的数据共享。

结果

subMG通过允许研究人员以单一形式输入其研究中的文件和元数据，并自动执行原本需要大量人工和专业知识的下游任务，简化了向欧洲核苷酸档案馆（ENA）提交宏基因组学研究结果的过程。该工具附带全面的文档以及针对不同用例量身定制的示例数据，并且可以通过命令行或图形用户界面（GUI）操作，使其易于部署到广泛的潜在用户。

结论

通过简化基因组解析宏基因组学研究数据集的提交，subMG显著减少了研究人员所需的时间、精力和专业知识，从而为未来更多、更全面的数据提交铺平了道路。更多有详细记录且符合FAIR原则的数据的可用性增加，将有利于未来的研究，特别是在荟萃分析和比较研究中。

相似文献

subMG automates data submission for metagenomics studies.subMG可自动完成宏基因组学研究的数据提交工作。

BioData Min. 2025 Jun 5;18(1):38. doi: 10.1186/s13040-025-00453-w.

EMBL2checklists: A Python package to facilitate the user-friendly submission of plant and fungal DNA barcoding sequences to ENA.EMBL2checklists：一个方便用户向 ENA 提交植物和真菌 DNA 条形码序列的 Python 包。

PLoS One. 2019 Jan 10;14(1):e0210347. doi: 10.1371/journal.pone.0210347. eCollection 2019.

"METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive".METAGENOTE：一个简化的基因组样本元数据注释的网络平台，简化了向 NCBI 的序列读取档案提交的流程。

BMC Bioinformatics. 2020 Sep 3;21(1):378. doi: 10.1186/s12859-020-03694-0.

EGAsubmitter: A software to automate submission of nucleic acid sequencing data to the European Genome-phenome Archive.EGA提交工具：一种用于将核酸测序数据自动提交至欧洲基因组-表型组档案库的软件。

Front Bioinform. 2023 Mar 30;3:1143014. doi: 10.3389/fbinf.2023.1143014. eCollection 2023.

A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive.一个用于欧洲核苷酸档案库的 SARS-CoV-2 序列提交工具。

Bioinformatics. 2021 Nov 5;37(21):3983-3985. doi: 10.1093/bioinformatics/btab421.

grabseqs: simple downloading of reads and metadata from multiple next-generation sequencing data repositories.grabseqs：从多个下一代测序数据存储库中简单地下载读取和元数据。

Bioinformatics. 2020 Jun 1;36(11):3607-3609. doi: 10.1093/bioinformatics/btaa167.

AgroSeek: a system for computational analysis of environmental metagenomic data and associated metadata.AgroSeek：一个用于环境宏基因组数据及其相关元数据的计算分析系统。

BMC Bioinformatics. 2021 Mar 10;22(1):117. doi: 10.1186/s12859-021-04035-5.

Facilitating accessible, rapid, and appropriate processing of ancient metagenomic data with AMDirT.使用 AMDirT 促进古代宏基因组数据的可访问、快速和适当处理。

F1000Res. 2024 May 28;12:926. doi: 10.12688/f1000research.134798.2. eCollection 2023.

Dancing the Nanopore limbo - Nanopore metagenomics from small DNA quantities for bacterial genome reconstruction.跳纳米孔的矮子舞——从小 DNA 量中进行纳米孔宏基因组学，用于细菌基因组重建。

BMC Genomics. 2023 Dec 1;24(1):727. doi: 10.1186/s12864-023-09853-w.

annonex2embl: automatic preparation of annotated DNA sequences for bulk submissions to ENA.annonex2embl：将注释的 DNA 序列自动准备批量提交到 ENA。

Bioinformatics. 2020 Jun 1;36(12):3841-3848. doi: 10.1093/bioinformatics/btaa209.

本文引用的文献

BMC Bioinformatics. 2020 Sep 3;21(1):378. doi: 10.1186/s12859-020-03694-0.

Every fifth published metagenome is not available to science.每五分之一已发表的宏基因组学数据对科学界不可用。

PLoS Biol. 2020 Apr 3;18(4):e3000698. doi: 10.1371/journal.pbio.3000698. eCollection 2020 Apr.

GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database.GTDB-Tk：一个使用基因组分类数据库对基因组进行分类的工具包。

Bioinformatics. 2019 Nov 15;36(6):1925-7. doi: 10.1093/bioinformatics/btz848.

PLoS One. 2019 Jan 10;14(1):e0210347. doi: 10.1371/journal.pone.0210347. eCollection 2019.

Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea.细菌和古菌单扩增基因组（MISAG）及宏基因组组装基因组（MIMAG）的最低信息要求

Nat Biotechnol. 2017 Aug 8;35(8):725-731. doi: 10.1038/nbt.3893.

The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。

Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.CheckM：评估从分离株、单细胞和宏基因组中获得的微生物基因组质量。

Genome Res. 2015 Jul;25(7):1043-55. doi: 10.1101/gr.186072.114. Epub 2015 May 14.

ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level.ISA 软件套件：支持符合标准的实验注释，并能够在社区层面进行管理。

Bioinformatics. 2010 Sep 15;26(18):2354-6. doi: 10.1093/bioinformatics/btq415. Epub 2010 Aug 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

subMG可自动完成宏基因组学研究的数据提交工作。

subMG automates data submission for metagenomics studies.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献