Suppr超能文献

用于预测COVID-19患者入院时预后的循环神经网络模型(CovRNN):使用电子健康记录数据进行模型开发和验证

Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data.

作者信息

Rasmy Laila, Nigo Masayuki, Kannadath Bijun Sai, Xie Ziqian, Mao Bingyu, Patel Khush, Zhou Yujia, Zhang Wanheng, Ross Angela, Xu Hua, Zhi Degui

机构信息

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.

McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA.

出版信息

Lancet Digit Health. 2022 Jun;4(6):e415-e425. doi: 10.1016/S2589-7500(22)00049-8. Epub 2022 Apr 21.

Abstract

BACKGROUND

Predicting outcomes of patients with COVID-19 at an early stage is crucial for optimised clinical care and resource management, especially during a pandemic. Although multiple machine learning models have been proposed to address this issue, because of their requirements for extensive data preprocessing and feature engineering, they have not been validated or implemented outside of their original study site. Therefore, we aimed to develop accurate and transferrable predictive models of outcomes on hospital admission for patients with COVID-19.

METHODS

In this study, we developed recurrent neural network-based models (CovRNN) to predict the outcomes of patients with COVID-19 by use of available electronic health record data on admission to hospital, without the need for specific feature selection or missing data imputation. CovRNN was designed to predict three outcomes: in-hospital mortality, need for mechanical ventilation, and prolonged hospital stay (>7 days). For in-hospital mortality and mechanical ventilation, CovRNN produced time-to-event risk scores (survival prediction; evaluated by the concordance index) and all-time risk scores (binary prediction; area under the receiver operating characteristic curve [AUROC] was the main metric); we only trained a binary classification model for prolonged hospital stay. For binary classification tasks, we compared CovRNN against traditional machine learning algorithms: logistic regression and light gradient boost machine. Our models were trained and validated on the heterogeneous, deidentified data of 247 960 patients with COVID-19 from 87 US health-care systems derived from the Cerner Real-World COVID-19 Q3 Dataset up to September 2020. We held out the data of 4175 patients from two hospitals for external validation. The remaining 243 785 patients from the 85 health systems were grouped into training (n=170 626), validation (n=24 378), and multi-hospital test (n=48 781) sets. Model performance was evaluated in the multi-hospital test set. The transferability of CovRNN was externally validated by use of deidentified data from 36 140 patients derived from the US-based Optum deidentified COVID-19 electronic health record dataset (version 1015; from January, 2007, to Oct 15, 2020). Exact dates of data extraction were masked by the databases to ensure patient data safety.

FINDINGS

CovRNN binary models achieved AUROCs of 93·0% (95% CI 92·6-93·4) for the prediction of in-hospital mortality, 92·9% (92·6-93·2) for the prediction of mechanical ventilation, and 86·5% (86·2-86·9) for the prediction of a prolonged hospital stay, outperforming light gradient boost machine and logistic regression algorithms. External validation confirmed AUROCs in similar ranges (91·3-97·0% for in-hospital mortality prediction, 91·5-96·0% for the prediction of mechanical ventilation, and 81·0-88·3% for the prediction of prolonged hospital stay). For survival prediction, CovRNN achieved a concordance index of 86·0% (95% CI 85·1-86·9) for in-hospital mortality and 92·6% (92·2-93·0) for mechanical ventilation.

INTERPRETATION

Trained on a large, heterogeneous, real-world dataset, our CovRNN models showed high prediction accuracy and transferability through consistently good performances on multiple external datasets. Our results show the feasibility of a COVID-19 predictive model that delivers high accuracy without the need for complex feature engineering.

FUNDING

Cancer Prevention and Research Institute of Texas.

摘要

背景

在早期预测2019冠状病毒病(COVID-19)患者的预后对于优化临床护理和资源管理至关重要,尤其是在大流行期间。尽管已经提出了多种机器学习模型来解决这一问题,但由于它们需要大量的数据预处理和特征工程,因此尚未在其原始研究地点之外得到验证或应用。因此,我们旨在开发准确且可转移的COVID-19患者入院时预后预测模型。

方法

在本研究中,我们开发了基于循环神经网络的模型(CovRNN),通过使用入院时可用的电子健康记录数据来预测COVID-19患者的预后,而无需进行特定的特征选择或缺失数据插补。CovRNN旨在预测三种预后:院内死亡、机械通气需求和延长住院时间(>7天)。对于院内死亡和机械通气,CovRNN生成事件发生时间风险评分(生存预测;通过一致性指数评估)和全时风险评分(二元预测;主要指标是受试者操作特征曲线下面积[AUROC]);我们仅针对延长住院时间训练了一个二元分类模型。对于二元分类任务,我们将CovRNN与传统机器学习算法进行比较:逻辑回归和轻梯度提升机。我们的模型在来自美国Cerner真实世界COVID-19第三季度数据集的87个美国医疗系统的247960例COVID-19患者的异质、去识别数据上进行训练和验证,截至2020年9月。我们留出两家医院的4175例患者的数据用于外部验证。来自85个医疗系统的其余243785例患者被分为训练集(n=170626)、验证集(n=24378)和多医院测试集(n=48781)。在多医院测试集中评估模型性能。CovRNN的可转移性通过使用来自美国Optum去识别COVID-19电子健康记录数据集(版本1015;从2007年1月到2020年10月15日)的36140例患者的去识别数据进行外部验证。数据库掩盖了数据提取的确切日期,以确保患者数据安全。

结果

CovRNN二元模型在预测院内死亡方面的AUROC为93.0%(95%CI 92.6-93.4),在预测机械通气方面为92.9%(92.6-93.2),在预测延长住院时间方面为86.5%(86.2-86.9),优于轻梯度提升机和逻辑回归算法。外部验证证实了类似范围内的AUROC(院内死亡预测为91.3-97.0%,机械通气预测为91.5-96.0%,延长住院时间预测为81.0-88.3%)。对于生存预测,CovRNN在院内死亡方面的一致性指数为86.0%(95%CI 85.1-86.9),在机械通气方面为92.6%(92.2-93.0)。

解读

在一个大型、异质、真实世界数据集上进行训练,我们的CovRNN模型通过在多个外部数据集上始终如一地表现良好,显示出高预测准确性和可转移性。我们的结果表明了一种无需复杂特征工程即可提供高精度的COVID-19预测模型的可行性。

资金

德克萨斯州癌症预防与研究研究所。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8ec6/9023005/96b789c693d2/gr1_lrg.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验