Suppr超能文献

电子健康记录联合学习中的数据异质性:重症监护中急性肾损伤和脓毒症疾病风险预测的案例研究

Data heterogeneity in federated learning with Electronic Health Records: Case studies of risk prediction for acute kidney injury and sepsis diseases in critical care.

作者信息

Rajendran Suraj, Xu Zhenxing, Pan Weishen, Ghosh Arnab, Wang Fei

机构信息

Tri-Institutional Computational Biology & Medicine Program, Cornell University, New York, New York, United States of America.

Division of Health Informatics, Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, United States of America.

出版信息

PLOS Digit Health. 2023 Mar 15;2(3):e0000117. doi: 10.1371/journal.pdig.0000117. eCollection 2023 Mar.

Abstract

With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis' high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites' data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model's parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity.

摘要

随着电子健康记录(EHR)等医疗保健数据的更广泛可得性,越来越多基于数据驱动的方法被提出来以提高医疗服务质量。预测建模旨在构建用于预测临床风险的计算模型,是医疗保健分析中一个热门的研究课题。然而,对医疗保健数据隐私的担忧可能会阻碍有效且可推广的预测模型的开发,因为这通常需要来自多个临床机构的丰富多样的数据。最近,联邦学习(FL)在解决这一问题方面显示出了前景。然而,来自不同本地参与站点的数据异质性可能会影响联邦模型的预测性能。由于急性肾损伤(AKI)和脓毒症在重症监护病房(ICU)患者中具有较高的患病率,基于人工智能对这些病症进行早期预测是重症医学中的一个重要课题。在本研究中,我们以ICU中AKI和脓毒症发病风险预测为例,探讨FL框架中数据异质性的影响,并比较不同框架下的性能。我们使用多家医院的EHR数据,基于本地、合并和FL框架构建了预测模型。本地框架仅使用每个站点自身的数据。合并框架将所有站点的数据合并在一起。在FL框架中,每个本地站点无法访问其他站点的数据。模型在本地进行更新,其参数被共享给一个中央聚合器,该聚合器用于更新联邦模型的参数,随后再与每个站点共享。我们发现,在FL框架内构建的模型优于本地模型。然后,我们分析了不同站点和框架之间变量重要性的差异。最后,我们探索了EHR数据中异质性的潜在来源。人口统计学特征、用药情况和站点信息的不同分布导致了数据异质性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/69e1/10016691/04ffdc003dc9/pdig.0000117.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验