Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, College Park, MD, USA.
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
Microb Genom. 2023 Dec;9(12). doi: 10.1099/mgen.0.001145.
Fast, efficient public health actions require well-organized and coordinated systems that can supply timely and accurate knowledge. Public databases of pathogen genomic data, such as the International Nucleotide Sequence Database Collaboration (INSDC), have become essential tools for efficient public health decisions. However, these international resources began primarily for academic purposes, rather than for surveillance or interventions. Now, queries need to access not only the whole genomes of multiple pathogens but also make connections using robust contextual metadata to identify issues of public health relevance. Databases that over time developed a patchwork of submission formats and requirements need to be consistently organized and coordinated internationally to allow effective searches.To help resolve these issues, we propose a common pathogen data structure called the Pathogen Data Object Model (DOM) that will formalize the minimum pieces of sequence data and contextual data necessary for general public health uses, while recognizing that submitters will likely withhold a wide range of non-public contextual data. Further, we propose contributors use the Pathogen DOM for all pathogen submissions (bacterial, viral, fungal, and parasites), which will simplify data submissions and provide a consistent and transparent data structure for downstream data analyses. We also highlight how improved submission tools can support the Pathogen DOM, offering users additional easy-to-use methods to ensure this structure is followed.
快速、高效的公共卫生行动需要组织良好、协调一致的系统,以便提供及时、准确的知识。病原体基因组数据的公共数据库,如国际核苷酸序列数据库协作组织(INSDC),已成为高效公共卫生决策的重要工具。然而,这些国际资源最初主要是为学术目的而建立的,而不是为监测或干预目的。现在,查询不仅需要访问多种病原体的全基因组,还需要使用强大的上下文元数据建立连接,以识别与公共卫生相关的问题。随着时间的推移,数据库逐渐形成了提交格式和要求的拼凑,需要在国际上进行一致的组织和协调,以允许进行有效的搜索。为了解决这些问题,我们提出了一个通用的病原体数据结构,称为病原体数据对象模型(DOM),它将正式确定一般公共卫生用途所需的序列数据和上下文数据的最小部分,同时认识到提交者可能会保留广泛的非公共上下文数据。此外,我们建议贡献者将病原体 DOM 用于所有病原体提交(细菌、病毒、真菌和寄生虫),这将简化数据提交,并为下游数据分析提供一致和透明的数据结构。我们还强调了改进的提交工具如何支持病原体 DOM,为用户提供额外的易用方法,以确保遵循此结构。