Suppr超能文献

利用自动化机器学习预测 COVID-19 患者的死亡率:预测模型开发研究。

Using Automated Machine Learning to Predict the Mortality of Patients With COVID-19: Prediction Model Development Study.

机构信息

Department of Pathology, Albert Einstein College of Medicine, Montefiore Medical Center, The Bronx, NY, United States.

Tsubomi Technology, The Bronx, NY, United States.

出版信息

J Med Internet Res. 2021 Feb 26;23(2):e23458. doi: 10.2196/23458.

Abstract

BACKGROUND

During a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one machine learning algorithm and limited performance evaluation to area under the curve analysis. To obtain the best results possible, it may be important to test different machine learning algorithms to find the best prediction model.

OBJECTIVE

In this study, we aimed to use automated machine learning (autoML) to train various machine learning algorithms. We selected the model that best predicted patients' chances of surviving a SARS-CoV-2 infection. In addition, we identified which variables (ie, vital signs, biomarkers, comorbidities, etc) were the most influential in generating an accurate model.

METHODS

Data were retrospectively collected from all patients who tested positive for COVID-19 at our institution between March 1 and July 3, 2020. We collected 48 variables from each patient within 36 hours before or after the index time (ie, real-time polymerase chain reaction positivity). Patients were followed for 30 days or until death. Patients' data were used to build 20 machine learning models with various algorithms via autoML. The performance of machine learning models was measured by analyzing the area under the precision-recall curve (AUPCR). Subsequently, we established model interpretability via Shapley additive explanation and partial dependence plots to identify and rank variables that drove model predictions. Afterward, we conducted dimensionality reduction to extract the 10 most influential variables. AutoML models were retrained by only using these 10 variables, and the output models were evaluated against the model that used 48 variables.

RESULTS

Data from 4313 patients were used to develop the models. The best model that was generated by using autoML and 48 variables was the stacked ensemble model (AUPRC=0.807). The two best independent models were the gradient boost machine and extreme gradient boost models, which had an AUPRC of 0.803 and 0.793, respectively. The deep learning model (AUPRC=0.73) was substantially inferior to the other models. The 10 most influential variables for generating high-performing models were systolic and diastolic blood pressure, age, pulse oximetry level, blood urea nitrogen level, lactate dehydrogenase level, D-dimer level, troponin level, respiratory rate, and Charlson comorbidity score. After the autoML models were retrained with these 10 variables, the stacked ensemble model still had the best performance (AUPRC=0.791).

CONCLUSIONS

We used autoML to develop high-performing models that predicted the survival of patients with COVID-19. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method for generating machine learning-based clinical decision support tools.

摘要

背景

在大流行期间,对临床医生进行患者分层并决定谁将获得有限的医疗资源非常重要。机器学习模型已被提出以准确预测 COVID-19 疾病的严重程度。以前的研究通常只测试了一种机器学习算法,并将性能评估限制在曲线下面积分析。为了获得尽可能好的结果,可能重要的是测试不同的机器学习算法以找到最佳的预测模型。

目的

在本研究中,我们旨在使用自动化机器学习(autoML)来训练各种机器学习算法。我们选择了最能预测患者 SARS-CoV-2 感染存活机会的模型。此外,我们确定了哪些变量(即生命体征、生物标志物、合并症等)对生成准确模型最有影响。

方法

数据是从我们机构 2020 年 3 月 1 日至 7 月 3 日期间 COVID-19 检测呈阳性的所有患者中回顾性收集的。我们从每位患者的索引时间前或后 36 小时内收集了 48 个变量(即实时聚合酶链反应阳性)。对患者进行 30 天的随访或直至死亡。患者的数据用于通过 autoML 构建了 20 种具有不同算法的机器学习模型。通过分析精度-召回率曲线下面积(AUPCR)来衡量机器学习模型的性能。随后,我们通过 Shapley 加法解释和部分依赖图建立模型可解释性,以确定和排名驱动模型预测的变量。之后,我们进行了降维以提取 10 个最有影响力的变量。仅使用这 10 个变量重新训练 autoML 模型,并对使用 48 个变量的模型进行评估。

结果

来自 4313 名患者的数据用于开发模型。使用 autoML 和 48 个变量生成的最佳模型是堆叠集成模型(AUPRC=0.807)。生成的两个最佳独立模型是梯度提升机和极端梯度提升模型,它们的 AUPRC 分别为 0.803 和 0.793。深度学习模型(AUPRC=0.73)明显不如其他模型。生成高性能模型的 10 个最重要变量是收缩压和舒张压、年龄、脉搏血氧饱和度水平、血尿素氮水平、乳酸脱氢酶水平、D-二聚体水平、肌钙蛋白水平、呼吸频率和 Charlson 合并症评分。在用这些 10 个变量重新训练 autoML 模型后,堆叠集成模型仍然具有最佳性能(AUPRC=0.791)。

结论

我们使用 autoML 开发了预测 COVID-19 患者生存的高性能模型。此外,我们确定了与死亡率相关的重要变量。这证明了 autoML 是一种高效、有效和信息丰富的方法,可用于生成基于机器学习的临床决策支持工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ac8/7919846/6d757504f785/jmir_v23i2e23458_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验