Suppr超能文献

为低收入和中等收入国家(LMICs)的预测模型整合电子健康记录

Synthesizing Electronic Health Records for Predictive Models in Low-Middle-Income Countries (LMICs).

作者信息

Ghosheh Ghadeer O, Thwaites C Louise, Zhu Tingting

机构信息

Department of Engineering Sciences, University of Oxford, Oxford OX1 3PJ, UK.

Oxford University Clinical Research Unit (OUCRU), Ho Chi Minh City 710400, Vietnam.

出版信息

Biomedicines. 2023 Jun 18;11(6):1749. doi: 10.3390/biomedicines11061749.

Abstract

The spread of machine learning models, coupled with by the growing adoption of electronic health records (EHRs), has opened the door for developing clinical decision support systems. However, despite the great promise of machine learning for healthcare in low-middle-income countries (LMICs), many data-specific limitations, such as the small size and irregular sampling, hinder the progress in such applications. Recently, deep generative models have been proposed to generate realistic-looking synthetic data, including EHRs, by learning the underlying data distribution without compromising patient privacy. In this study, we first use a deep generative model to generate synthetic data based on a small dataset (364 patients) from a LMIC setting. Next, we use synthetic data to build models that predict the onset of hospital-acquired infections based on minimal information collected at patient ICU admission. The performance of the diagnostic model trained on the synthetic data outperformed models trained on the original and oversampled data using techniques such as SMOTE. We also experiment with varying the size of the synthetic data and observe the impact on the performance and interpretability of the models. Our results show the promise of using deep generative models in enabling healthcare data owners to develop and validate models that serve their needs and applications, despite limitations in dataset size.

摘要

机器学习模型的传播,再加上电子健康记录(EHR)的日益普及,为临床决策支持系统的开发打开了大门。然而,尽管机器学习在中低收入国家(LMIC)的医疗保健领域前景广阔,但许多特定于数据的限制,如规模小和采样不规则,阻碍了此类应用的进展。最近,有人提出了深度生成模型,通过学习潜在的数据分布来生成逼真的合成数据,包括电子健康记录,同时不损害患者隐私。在本研究中,我们首先使用深度生成模型,基于来自中低收入国家环境的一个小数据集(364名患者)生成合成数据。接下来,我们使用合成数据构建模型,这些模型根据患者入住重症监护病房时收集的最少信息来预测医院获得性感染的发生。在合成数据上训练的诊断模型的性能优于使用SMOTE等技术在原始数据和过采样数据上训练的模型。我们还对合成数据的大小进行了变化实验,并观察其对模型性能和可解释性的影响。我们的结果表明,尽管数据集规模有限,但使用深度生成模型有望使医疗保健数据所有者开发和验证满足其需求和应用的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f04b/10295936/6987e126e914/biomedicines-11-01749-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验