Martínez-Romero Marcos, Horridge Matthew, Mistry Nilesh, Weyhmiller Aubrie, Yu Jimmy K, Fujimoto Alissa, Henry Aria, O'Connor Martin J, Sier Ashley, Suber Stephanie, Akdogan Mete U, Cao Yan, Valliappan Somu, Mieczkowska Joanna O, Krishnamurthy Ashok, Keller Michael A, Musen Mark A
Stanford University, Stanford Center for Biomedical Informatics Research, Palo Alto, CA, United States.
Booz Allen Hamilton Inc., McLean, VA, United States.
JMIR Public Health Surveill. 2025 Aug 20;11:e72677. doi: 10.2196/72677.
The COVID-19 pandemic exposed significant limitations in existing data infrastructure, particularly the lack of systems for rapidly collecting, integrating, and analyzing data to support timely and evidence-based public health responses. These shortcomings hampered efforts to conduct comprehensive analyses and make rapid, data-driven decisions in response to emerging threats. To overcome these challenges, the US National Institutes of Health launched the Rapid Acceleration of Diagnostics (RADx) initiative. A key component of this initiative is the RADx Data Hub-a centralized, cloud-based platform designed to support data sharing, harmonization, and reuse across multiple COVID-19 research programs and data sources.
We aim to present the design, implementation, and capabilities of the RADx Data Hub, a cloud-based platform developed to support findable, accessible, interoperable, reusable (FAIR) data practices and enable secondary analyses of the COVID-19-related data contributed by a nationwide network of researchers.
The RADx Data Hub was developed on a scalable cloud infrastructure, grounded in the FAIR data principles. The platform integrates heterogeneous data types-including clinical data, diagnostic test results, behavioral data, and social determinants of health-submitted by over 100 research organizations across 46 US states and territories. The data pipeline includes automated and manual processes for deidentification, quality validation, expert curation, and harmonization. Metadata standards are enforced using tools such as the Center for Expanded Data Annotation and Retrieval (CEDAR) Workbench and BioPortal. Data files are structured using a unified specification to support consistent representation and machine-actionable metadata.
As of May 2025, the RADx Data Hub hosts 187 studies and over 1700 data files, spanning 4 RADx programs: RADx Underserved Populations (RADx-UP), RADx Radical (RADx-rad), RADx Tech, and RADx Digital Health Technologies (RADx DHT). The Study Explorer and Analytics Workbench components enable researchers to discover relevant studies, inspect rich metadata, and conduct analyses within a secure cloud-based environment. Harmonized data conforming to a core set of common data elements facilitate cross-study integration and support secondary use. The platform provides persistent identifiers (digital object identifiers) for each study and supports access to structured metadata that adhere to the CEDAR specification, available in both JSON and YAML formats for seamless integration into computational workflows.
The RADx Data Hub successfully addresses key data integration challenges by providing a centralized, FAIR-compliant platform for public health research. Its adaptable architecture and data management practices are designed to support secondary analyses and can be repurposed for other scientific disciplines, strengthening data infrastructure and enhancing preparedness for future health crises.
2019冠状病毒病(COVID-19)大流行暴露了现有数据基础设施的重大局限性,尤其是缺乏用于快速收集、整合和分析数据以支持及时且基于证据的公共卫生应对措施的系统。这些缺陷阻碍了进行全面分析以及针对新出现的威胁做出快速、数据驱动决策的努力。为克服这些挑战,美国国立卫生研究院发起了诊断快速加速(RADx)倡议。该倡议的一个关键组成部分是RADx数据中心——一个基于云的集中式平台,旨在支持跨多个COVID-19研究项目和数据源的数据共享、协调和重用。
我们旨在介绍RADx数据中心的设计、实施和功能,这是一个基于云的平台,旨在支持可查找、可访问、可互操作、可重用(FAIR)的数据实践,并对全国范围内的研究人员网络贡献的与COVID-19相关的数据进行二次分析。
RADx数据中心基于FAIR数据原则,在可扩展的云基础设施上开发。该平台整合了异构数据类型,包括临床数据、诊断测试结果、行为数据以及健康的社会决定因素,这些数据由美国46个州和领地的100多个研究机构提交。数据管道包括用于去识别、质量验证、专家策展和协调的自动化和手动流程。使用诸如扩展数据注释和检索中心(CEDAR)工作台和生物门户等工具来执行元数据标准。数据文件使用统一规范进行结构化,以支持一致的表示和机器可操作的元数据。
截至2025年5月,RADx数据中心托管了187项研究和1700多个数据文件,涵盖4个RADx项目:RADx服务不足人群(RADx-UP)、RADx激进(RADx-rad)、RADx技术和RADx数字健康技术(RADx DHT)。研究浏览器和分析工作台组件使研究人员能够发现相关研究、检查丰富的元数据,并在安全的基于云的环境中进行分析。符合一组核心通用数据元素的协调数据有助于跨研究整合并支持二次使用。该平台为每项研究提供持久标识符(数字对象标识符),并支持访问符合CEDAR规范的结构化元数据,以JSON和YAML格式提供,以便无缝集成到计算工作流程中。
RADx数据中心通过为公共卫生研究提供一个集中式、符合FAIR标准的平台,成功解决了关键的数据整合挑战。其适应性强的架构和数据管理实践旨在支持二次分析,并且可以重新用于其他科学学科,从而加强数据基础设施并提高对未来健康危机的应对能力。