Helmholtz Institute for Functional Marine Biodiversity, 26129, Oldenburg, Germany.
Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research, 27570, Bremerhaven, Germany.
Sci Data. 2024 Sep 4;11(1):967. doi: 10.1038/s41597-024-03778-z.
The remarkable pace of genomic data generation is rapidly transforming our understanding of life at the micron scale. Yet this data stream also creates challenges for team science. A single microbe can have multiple versions of genome architecture, functional gene annotations, and gene identifiers; additionally, the lack of mechanisms for collating and preserving advances in this knowledge raises barriers to community coalescence around shared datasets. "Digital Microbes" are frameworks for interoperable and reproducible collaborative science through open source, community-curated data packages built on a (pan)genomic foundation. Housed within an integrative software environment, Digital Microbes ensure real-time alignment of research efforts for collaborative teams and facilitate novel scientific insights as new layers of data are added. Here we describe two Digital Microbes: 1) the heterotrophic marine bacterium Ruegeria pomeroyi DSS-3 with > 100 transcriptomic datasets from lab and field studies, and 2) the pangenome of the cosmopolitan marine heterotroph Alteromonas containing 339 genomes. Examples demonstrate how an integrated framework collating public (pan)genome-informed data can generate novel and reproducible findings.
基因组数据的生成速度令人瞩目,这迅速改变了我们对微观尺度生命的理解。然而,这些数据流也给团队科学带来了挑战。单个微生物可能有多个基因组结构版本、功能基因注释和基因标识符;此外,缺乏整理和保存这方面知识进展的机制,阻碍了社区围绕共享数据集的融合。“数字微生物”是通过开源、社区管理的数据包构建的可互操作和可重复协作科学的框架,这些数据基于(泛)基因组基础。数字微生物位于集成软件环境中,可确保协作团队的研究工作实时对齐,并在添加新的数据层时促进新的科学见解。在这里,我们描述了两个数字微生物:1)异养海洋细菌 Ruegeria pomeroyi DSS-3,其拥有来自实验室和野外研究的超过 100 个转录组数据集,2)海洋异养生物 Alteromonas 的泛基因组,其中包含 339 个基因组。示例展示了如何整合公共(泛)基因组信息数据的综合框架可以生成新颖且可重复的发现。