Research Institute, National Health Insurance Service Ilsan Hospital, Goyang, Korea.
Department of Big Data, National Health Insurance Service, Wonju, Korea.
Sci Rep. 2020 Oct 30;10(1):18716. doi: 10.1038/s41598-020-75767-2.
The rapid spread of COVID-19 has resulted in the shortage of medical resources, which necessitates accurate prognosis prediction to triage patients effectively. This study used the nationwide cohort of South Korea to develop a machine learning model to predict prognosis based on sociodemographic and medical information. Of 10,237 COVID-19 patients, 228 (2.2%) died, 7772 (75.9%) recovered, and 2237 (21.9%) were still in isolation or being treated at the last follow-up (April 16, 2020). The Cox proportional hazards regression analysis revealed that age > 70, male sex, moderate or severe disability, the presence of symptoms, nursing home residence, and comorbidities of diabetes mellitus (DM), chronic lung disease, or asthma were significantly associated with increased risk of mortality (p ≤ 0.047). For machine learning, the least absolute shrinkage and selection operator (LASSO), linear support vector machine (SVM), SVM with radial basis function kernel, random forest (RF), and k-nearest neighbors were tested. In prediction of mortality, LASSO and linear SVM demonstrated high sensitivities (90.7% [95% confidence interval: 83.3, 97.3] and 92.0% [85.9, 98.1], respectively) and specificities (91.4% [90.3, 92.5] and 91.8%, [90.7, 92.9], respectively) while maintaining high specificities > 90%, as well as high area under the receiver operating characteristics curves (0.963 [0.946, 0.979] and 0.962 [0.945, 0.979], respectively). The most significant predictors for LASSO included old age and preexisting DM or cancer; for RF they were old age, infection route (cluster infection or infection from personal contact), and underlying hypertension. The proposed prediction model may be helpful for the quick triage of patients without having to wait for the results of additional tests such as laboratory or radiologic studies, during a pandemic when limited medical resources must be wisely allocated without hesitation.
新冠病毒(COVID-19)的迅速传播导致医疗资源短缺,这就需要准确的预后预测来有效地对患者进行分诊。本研究使用韩国的全国队列数据,开发了一种基于社会人口学和医疗信息的机器学习模型来预测预后。在 10237 例 COVID-19 患者中,228 例(2.2%)死亡,7772 例(75.9%)康复,2237 例(21.9%)在最后一次随访(2020 年 4 月 16 日)时仍处于隔离或治疗中。Cox 比例风险回归分析显示,年龄>70 岁、男性、中度或重度残疾、有症状、疗养院居住以及合并糖尿病(DM)、慢性肺病或哮喘与死亡率增加显著相关(p≤0.047)。对于机器学习,测试了最小绝对收缩和选择算子(LASSO)、线性支持向量机(SVM)、具有径向基函数核的 SVM、随机森林(RF)和 K 最近邻。在预测死亡率方面,LASSO 和线性 SVM 表现出较高的灵敏度(90.7%[83.3, 97.3]和 92.0%[85.9, 98.1])和特异性(91.4%[90.3, 92.5]和 91.8%[90.7, 92.9]),同时保持特异性>90%,以及高受试者工作特征曲线下面积(0.963[0.946, 0.979]和 0.962[0.945, 0.979])。LASSO 的最重要预测因素包括年龄较大和预先存在的 DM 或癌症;对于 RF,它们是年龄较大、感染途径(集群感染或个人接触感染)和潜在的高血压。该预测模型在大流行期间可能有助于快速分诊患者,而不必等待额外的实验室或影像学研究等结果,在大流行期间,必须毫不犹豫地明智分配有限的医疗资源。