利用纵向电子健康记录进行准确 COVID-19 健康结果预测的机器学习管道。

A Machine Learning Pipeline for Accurate COVID-19 Health Outcome Prediction using Longitudinal Electronic Health Records.

机构信息

The Harker School, San Jose, California.

出版信息

AMIA Annu Symp Proc. 2022 Feb 21;2021:448-456. eCollection 2021.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8861740/

Abstract

Current COVID-19 predictive models primarily focus on predicting the risk of mortality, and rely on COVID-19 specific medical data such as chest imaging after COVID-19 diagnosis. In this project, we developed an innovative supervised machine learning pipeline using longitudinal Electronic Health Records (EHR) to accurately predict COVID-19 related health outcomes including mortality, ventilation, days in hospital or ICU. In particular, we developed unique and effective data processing algorithms, including data cleaning, initial feature screening, vector representation. Then we trained models using state-of-the-art machine learning strategies combined with different parameter settings. Based on routinely collected EHR, our machine learning pipeline not only consistently outperformed those developed by other research groups using the same set of data, but also achieved similar accuracy as those trained on medical data that were only available after COVID-19 diagnosis. In addition, top risk factors for COVID-19 were identified, and are consistent with epidemiologic findings.

摘要

目前的 COVID-19 预测模型主要集中在预测死亡率的风险上，并依赖于 COVID-19 特定的医疗数据，如 COVID-19 诊断后的胸部成像。在这个项目中，我们使用纵向电子健康记录 (EHR) 开发了一个创新的监督机器学习管道，以准确预测 COVID-19 相关的健康结果，包括死亡率、通气、住院或 ICU 天数。特别是，我们开发了独特而有效的数据处理算法，包括数据清理、初始特征筛选、向量表示。然后，我们使用最先进的机器学习策略结合不同的参数设置来训练模型。基于常规收集的 EHR，我们的机器学习管道不仅始终优于其他研究小组使用相同数据集开发的模型，而且与仅在 COVID-19 诊断后可用的医疗数据训练的模型具有相似的准确性。此外，确定了 COVID-19 的主要风险因素，这些因素与流行病学发现一致。

相似文献

1

A Machine Learning Pipeline for Accurate COVID-19 Health Outcome Prediction using Longitudinal Electronic Health Records.利用纵向电子健康记录进行准确 COVID-19 健康结果预测的机器学习管道。

AMIA Annu Symp Proc. 2022 Feb 21;2021:448-456. eCollection 2021.

2

The Development and Validation of Simplified Machine Learning Algorithms to Predict Prognosis of Hospitalized Patients With COVID-19: Multicenter, Retrospective Study.中文译文：简化机器学习算法预测 COVID-19 住院患者预后的开发和验证：多中心回顾性研究。

J Med Internet Res. 2022 Jan 21;24(1):e31549. doi: 10.2196/31549.

3

Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data.用于预测COVID-19患者入院时预后的循环神经网络模型（CovRNN）：使用电子健康记录数据进行模型开发和验证

Lancet Digit Health. 2022 Jun;4(6):e415-e425. doi: 10.1016/S2589-7500(22)00049-8. Epub 2022 Apr 21.

4

Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

5

Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: Findings from a machine learning pilot project.使用链接的索赔-电子健康记录数据库对感兴趣的健康结果进行电子表型分析：来自机器学习试点项目的结果。

J Am Med Inform Assoc. 2021 Jul 14;28(7):1507-1517. doi: 10.1093/jamia/ocab036.

6

Precision Assessment of COVID-19 Phenotypes Using Large-Scale Clinic Visit Audio Recordings: Harnessing the Power of Patient Voice.利用大规模临床就诊音频记录精准评估 COVID-19 表型：挖掘患者声音的力量。

J Med Internet Res. 2021 Feb 19;23(2):e20545. doi: 10.2196/20545.

7

Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data.通过FIDDLE实现电子健康记录分析的普及：一种用于结构化临床数据的灵活的数据驱动预处理管道。

J Am Med Inform Assoc. 2020 Dec 9;27(12):1921-1934. doi: 10.1093/jamia/ocaa139.

8

Predicting post-stroke pneumonia using deep neural network approaches.使用深度神经网络方法预测卒中后肺炎。

Int J Med Inform. 2019 Dec;132:103986. doi: 10.1016/j.ijmedinf.2019.103986. Epub 2019 Oct 1.

9

A machine learning-based framework to identify type 2 diabetes through electronic health records.一种基于机器学习的通过电子健康记录识别2型糖尿病的框架。

Int J Med Inform. 2017 Jan;97:120-127. doi: 10.1016/j.ijmedinf.2016.09.014. Epub 2016 Oct 1.

10

COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation.基于大型多状态电子健康记录和实验室信息系统数据集的深度学习预测 COVID-19 死亡率：算法开发与验证。

J Med Internet Res. 2021 Sep 28;23(9):e30157. doi: 10.2196/30157.

本文引用的文献

1

Relationship Between the ABO Blood Group and the Coronavirus Disease 2019 (COVID-19) Susceptibility.ABO 血型与 2019 年冠状病毒病（COVID-19）易感性的关系。

Clin Infect Dis. 2021 Jul 15;73(2):328-331. doi: 10.1093/cid/ciaa1150.

2

Comparing Rapid Scoring Systems in Mortality Prediction of Critically Ill Patients With Novel Coronavirus Disease.比较新型冠状病毒疾病危重症患者死亡率预测的快速评分系统。

Acad Emerg Med. 2020 Jun;27(6):461-468. doi: 10.1111/acem.13992. Epub 2020 May 21.

3

Well-aerated Lung on Admitting Chest CT to Predict Adverse Outcome in COVID-19 Pneumonia.胸部 CT 显示充气良好的肺可预测 COVID-19 肺炎的不良结局。

Radiology. 2020 Aug;296(2):E86-E96. doi: 10.1148/radiol.2020201433. Epub 2020 Apr 17.

4

A Tool for Early Prediction of Severe Coronavirus Disease 2019 (COVID-19): A Multicenter Study Using the Risk Nomogram in Wuhan and Guangdong, China.一种用于早期预测严重 2019 冠状病毒病（COVID-19）的工具：来自中国武汉和广东的多中心研究使用风险列线图。

Clin Infect Dis. 2020 Jul 28;71(15):833-840. doi: 10.1093/cid/ciaa443.

5

Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal.COVID-19 诊断和预后预测模型：系统评价和批判性评估。

BMJ. 2020 Apr 7;369:m1328. doi: 10.1136/bmj.m1328.

6

Association of radiologic findings with mortality of patients infected with 2019 novel coronavirus in Wuhan, China.中国武汉 2019 年新型冠状病毒感染患者的放射学表现与死亡率的相关性研究。

PLoS One. 2020 Mar 19;15(3):e0230548. doi: 10.1371/journal.pone.0230548. eCollection 2020.

7

Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study.中国武汉成人 COVID-19 住院患者的临床病程和死亡危险因素：一项回顾性队列研究。

Lancet. 2020 Mar 28;395(10229):1054-1062. doi: 10.1016/S0140-6736(20)30566-3. Epub 2020 Mar 11.

8

Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study.中国武汉 99 例 2019 年新型冠状病毒肺炎患者的流行病学和临床特征：描述性研究。

Lancet. 2020 Feb 15;395(10223):507-513. doi: 10.1016/S0140-6736(20)30211-7. Epub 2020 Jan 30.

9

Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.Synthea：一种用于生成合成患者及合成电子健康记录的方法、手段和软件机制。

J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238. doi: 10.1093/jamia/ocx079.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验