School of Applied Mathematics, Fundação Getulio Vargas, Rio de Janeiro, Brazil.
The Global Research and Analysis for Public Health Network, Geneva, Switzerland.
J Med Internet Res. 2023 Mar 6;25:e40554. doi: 10.2196/40554.
Guaranteeing durability, provenance, accessibility, and trust in open data sets can be challenging for researchers and organizations that rely on public repositories of data critical for epidemiology and other health analytics. The required data repositories are often difficult to locate and may require conversion to a standard data format. Data-hosting websites may also change or become unavailable without warning. A single change to the rules in one repository can hinder updating a public dashboard reliant on data pulled from external sources. These concerns are particularly challenging at the international level, because policies on systems aimed at harmonizing health and related data are typically dictated by national governments to serve their individual needs.
In this paper, we introduce a comprehensive public health data platform, EpiGraphHub, that aims to provide a single interoperable repository for open health and related data.
The platform, curated by the international research community, allows secure local integration of sensitive data while facilitating the development of data-driven applications and reports for decision-makers. Its main components include centrally managed databases with fine-grained access control to data, fully automated and documented data collection and transformation, and a powerful web-based data exploration and visualization tool.
EpiGraphHub is already being used for hosting a growing collection of open data sets and for automating epidemiological analyses based on them. The project has also released an open-source software library with the analytical methods used in the platform.
The platform is fully open source and open to external users. It is in active development with the goal of maximizing its value for large-scale public health studies.
对于依赖流行病学和其他健康分析关键数据的公共存储库的研究人员和组织来说,保证开放数据集的耐久性、出处、可访问性和信任是具有挑战性的。所需的数据存储库通常难以找到,并且可能需要转换为标准数据格式。数据托管网站也可能在没有警告的情况下更改或不可用。一个存储库中规则的单一更改可能会阻碍依赖从外部来源提取数据的公共仪表板的更新。这些问题在国际层面尤其具有挑战性,因为旨在协调健康和相关数据的系统政策通常由各国政府制定,以满足其各自的需求。
在本文中,我们介绍了一个全面的公共卫生数据平台 EpiGraphHub,旨在为开放卫生和相关数据提供一个单一的可互操作存储库。
该平台由国际研究界策划,允许在安全的本地环境中集成敏感数据,同时为决策者开发数据驱动的应用程序和报告提供便利。其主要组件包括集中管理的数据库,具有对数据的细粒度访问控制、完全自动化和记录的数据收集和转换,以及功能强大的基于网络的数据探索和可视化工具。
EpiGraphHub 已经用于托管越来越多的开放数据集,并基于这些数据集自动进行流行病学分析。该项目还发布了一个带有平台中使用的分析方法的开源软件库。
该平台完全开源并对外部用户开放。它正在积极开发中,目标是为大规模公共卫生研究最大化其价值。