University of British Columbia, Vancouver, BC, Canada.
Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada.
Microb Genom. 2023 Jan;9(1). doi: 10.1099/mgen.0.000908.
Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.
病原体基因组学是公共卫生监测、感染控制、疫情调查以及研究的重要工具。为了利用病原体基因组学数据,必须使用上下文数据(元数据)对其进行解释。上下文数据包括样本元数据、实验室方法、患者人口统计学信息、临床结果和流行病学信息。然而,不同机构捕获上下文信息的方式以及在不同数据库中编码的方式存在差异,这给数据解释、整合及其使用/再利用带来了挑战。DataHarmonizer 是一个基于模板的电子表格应用程序,用于协调、验证和转换基因组学上下文数据,使其格式符合公共或私人存储库的提交要求。该工具的基于网络浏览器的 JavaScript 环境可实现验证,其离线功能和本地安装可提高数据安全性。DataHarmonizer 是为满足 COVID-19 大流行期间出现的数据共享需求而开发的,加拿大 COVID 基因组网络(CanCOGeN)的成员使用该工具来协调 SARS-CoV-2 的上下文数据,以进行国家监测和向公共存储库提交。为了支持国际监测工作的协调,我们与公共卫生基因组流行病学联盟合作,还提供了一个符合其 SARS-CoV-2 上下文数据规范的模板,供全球使用。还正在为一个健康和食源性病原体开发模板。总体而言,DataHarmonizer 工具提高了上下文数据捕获的有效性和保真度,以及其后的可用性。全球范围内协调各机构、平台和系统的上下文信息,提高了数据的互操作性和可重用性,有利于协调一致的公共卫生和研究计划,以应对当前的大流行和未来的公共卫生紧急情况。虽然最初是为 COVID-19 大流行开发的,但它已经扩展到其他数据管理应用程序和病原体。