Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, United States.
Center for Data Solutions, University of Florida College of Medicine - Jacksonville, Jacksonville, FL, United States.
JMIR Res Protoc. 2024 Jul 8;13:e57981. doi: 10.2196/57981.
Pediatric asthma is a heterogeneous disease; however, current characterizations of its subtypes are limited. Machine learning (ML) methods are well-suited for identifying subtypes. In particular, deep neural networks can learn patient representations by leveraging longitudinal information captured in electronic health records (EHRs) while considering future outcomes. However, the traditional approach for subtype analysis requires large amounts of EHR data, which may contain protected health information causing potential concerns regarding patient privacy. Federated learning is the key technology to address privacy concerns while preserving the accuracy and performance of ML algorithms. Federated learning could enable multisite development and implementation of ML algorithms to facilitate the translation of artificial intelligence into clinical practice.
The aim of this study is to develop a research protocol for implementation of federated ML across a large clinical research network to identify and discover pediatric asthma subtypes and their progression over time.
This mixed methods study uses data and clinicians from the OneFlorida+ clinical research network, which is a large regional network covering linked and longitudinal patient-level real-world data (RWD) of over 20 million patients from Florida, Georgia, and Alabama in the United States. To characterize the subtypes, we will use OneFlorida+ data from 2011 to 2023 and develop a research-grade pediatric asthma computable phenotype and clinical natural language processing pipeline to identify pediatric patients with asthma aged 2-18 years. We will then apply federated learning to characterize pediatric asthma subtypes and their temporal progression. Using the Promoting Action on Research Implementation in Health Services framework, we will conduct focus groups with practicing pediatric asthma clinicians within the OneFlorida+ network to investigate the clinical utility of the subtypes. With a user-centered design, we will create prototypes to visualize the subtypes in the EHR to best assist with the clinical management of children with asthma.
OneFlorida+ data from 2011 to 2023 have been collected for 411,628 patients aged 2-18 years along with 11,156,148 clinical notes. We expect to complete the computable phenotyping within the first year of the project, followed by subtyping during the second and third years, and then will perform the focus groups and establish the user-centered design in the fourth and fifth years of the project.
Pediatric asthma subtypes incorporating RWD from diverse populations could improve patient outcomes by moving the field closer to precision pediatric asthma care. Our privacy-preserving federated learning methodology and qualitative implementation work will address several challenges of applying ML to large, multicenter RWD data.
INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/57981.
儿科哮喘是一种异质性疾病;然而,目前对其亚型的描述有限。机器学习(ML)方法非常适合识别亚型。特别是,深度神经网络可以通过利用电子健康记录(EHR)中捕获的纵向信息并考虑未来结果来学习患者表示。然而,传统的亚型分析方法需要大量的 EHR 数据,其中可能包含受保护的健康信息,这可能会引起患者隐私方面的潜在担忧。联邦学习是解决隐私问题的关键技术,同时保留 ML 算法的准确性和性能。联邦学习可以实现多站点开发和实施 ML 算法,以促进人工智能向临床实践的转化。
本研究旨在制定一个研究方案,在大型临床研究网络中实施联邦机器学习,以识别和发现儿科哮喘的亚型及其随时间的演变。
这项混合方法研究使用了 OneFlorida+临床研究网络的数据和临床医生,该网络是一个覆盖美国佛罗里达州、佐治亚州和阿拉巴马州的大型区域网络,链接和纵向的患者级真实世界数据(RWD)超过 2000 万例。为了描述这些亚型,我们将使用 OneFlorida+ 2011 年至 2023 年的数据,并开发一个研究级别的儿科哮喘可计算表型和临床自然语言处理管道,以识别 2-18 岁患有哮喘的儿科患者。然后,我们将应用联邦学习来描述儿科哮喘的亚型及其随时间的演变。我们将使用促进健康服务研究实施行动框架,在 OneFlorida+网络内的儿科哮喘临床医生中进行焦点小组讨论,以调查这些亚型的临床实用性。我们将采用以用户为中心的设计,创建原型以在 EHR 中可视化这些亚型,以帮助更好地管理患有哮喘的儿童。
从 2011 年到 2023 年,已经为 411628 名 2-18 岁的患者收集了 OneFlorida+数据,以及 11156148 份临床记录。我们预计在项目的第一年完成可计算表型,然后在第二年和第三年进行亚组分析,然后在第四年和第五年进行焦点小组和建立以用户为中心的设计。
结合来自不同人群的 RWD 的儿科哮喘亚型可以通过将该领域推向更精确的儿科哮喘护理来改善患者的预后。我们的隐私保护联邦学习方法和定性实施工作将解决将 ML 应用于大型多中心 RWD 数据的几个挑战。
国际注册报告标识符(IRRID):DERR1-10.2196/57981.