数十年磨一剑：通过合成数据、通用数据模型和联邦学习实现数字健康研究基础设施的演进

Decades in the Making: The Evolution of Digital Health Research Infrastructure Through Synthetic Data, Common Data Models, and Federated Learning.

作者信息

Austin Jodie A, Lobo Elton H, Samadbeik Mahnaz, Engstrom Teyl, Philip Reji, Pole Jason D, Sullivan Clair M

机构信息

Queensland Digital Health Centre, Centre for Health Services Research, The University of Queensland, Brisbane, Australia.

The Office of the Chief Clinical Information Officer, eHealth Queensland, Brisbane, Australia.

出版信息

J Med Internet Res. 2024 Dec 20;26:e58637. doi: 10.2196/58637.

DOI:10.2196/58637

PMID:39705072

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11699496/

Abstract

Traditionally, medical research is based on randomized controlled trials (RCTs) for interventions such as drugs and operative procedures. However, increasingly, there is a need for health research to evolve. RCTs are expensive to run, are generally formulated with a single research question in mind, and analyze a limited dataset for a restricted period. Progressively, health decision makers are focusing on real-world data (RWD) to deliver large-scale longitudinal insights that are actionable. RWD are collected as part of routine care in real time using digital health infrastructure. For example, understanding the effectiveness of an intervention could be enhanced by combining evidence from RCTs with RWD, providing insights into long-term outcomes in real-life situations. Clinicians and researchers struggle in the digital era to harness RWD for digital health research in an efficient and ethically and morally appropriate manner. This struggle encompasses challenges such as ensuring data quality, integrating diverse sources, establishing governance policies, ensuring regulatory compliance, developing analytical capabilities, and translating insights into actionable strategies. The same way that drug trials require infrastructure to support their conduct, digital health also necessitates new and disruptive research data infrastructure. Novel methods such as common data models, federated learning, and synthetic data generation are emerging to enhance the utility of research using RWD, which are often siloed across health systems. A continued focus on data privacy and ethical compliance remains. The past 25 years have seen a notable shift from an emphasis on RCTs as the only source of practice-guiding clinical evidence to the inclusion of modern-day methods harnessing RWD. This paper describes the evolution of synthetic data, common data models, and federated learning supported by strong cross-sector collaboration to support digital health research. Lessons learned are offered as a model for other jurisdictions with similar RWD infrastructure requirements.

摘要

传统上，医学研究基于随机对照试验（RCT）来开展药物和手术等干预措施的研究。然而，健康研究越来越需要不断发展。进行随机对照试验成本高昂，通常是围绕单一研究问题设计的，并且在有限时间段内分析有限的数据集。逐渐地，健康决策者开始关注真实世界数据（RWD），以获取可采取行动的大规模纵向见解。真实世界数据是利用数字健康基础设施在实时常规护理过程中收集的。例如，将随机对照试验的证据与真实世界数据相结合，有助于深入了解干预措施的有效性，从而洞察现实生活中的长期结果。在数字时代，临床医生和研究人员努力以高效且符合伦理道德的方式利用真实世界数据进行数字健康研究。这一努力面临诸多挑战，如确保数据质量、整合不同来源的数据、制定治理政策、确保符合监管要求、培养分析能力以及将见解转化为可采取行动的策略。正如药物试验需要基础设施来支持其开展一样，数字健康也需要全新的、具有颠覆性的研究数据基础设施。诸如通用数据模型、联邦学习和合成数据生成等新方法不断涌现，以提高利用常常分散在各个卫生系统中的真实世界数据进行研究的效用。对数据隐私和道德合规的持续关注仍然存在。在过去25年里，出现了显著的转变，从仅强调随机对照试验作为指导实践的临床证据的唯一来源，转变为纳入利用真实世界数据的现代方法。本文描述了在强大的跨部门合作支持下，合成数据、通用数据模型和联邦学习的发展，以支持数字健康研究。所汲取的经验教训可为其他具有类似真实世界数据基础设施要求的司法管辖区提供借鉴。