Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, UC San Diego School of Medicine, San Diego, CA, 92093, USA.
Research Service, VA San Diego Healthcare System, San Diego, CA, 92161, USA.
BMC Med Inform Decis Mak. 2020 Sep 29;20(1):247. doi: 10.1186/s12911-020-01266-z.
The recent Coronavirus Disease 2019 (COVID-19) pandemic has placed severe stress on healthcare systems worldwide, which is amplified by the critical shortage of COVID-19 tests.
In this study, we propose to generate a more accurate diagnosis model of COVID-19 based on patient symptoms and routine test results by applying machine learning to reanalyzing COVID-19 data from 151 published studies. We aim to investigate correlations between clinical variables, cluster COVID-19 patients into subtypes, and generate a computational classification model for discriminating between COVID-19 patients and influenza patients based on clinical variables alone.
We discovered several novel associations between clinical variables, including correlations between being male and having higher levels of serum lymphocytes and neutrophils. We found that COVID-19 patients could be clustered into subtypes based on serum levels of immune cells, gender, and reported symptoms. Finally, we trained an XGBoost model to achieve a sensitivity of 92.5% and a specificity of 97.9% in discriminating COVID-19 patients from influenza patients.
We demonstrated that computational methods trained on large clinical datasets could yield ever more accurate COVID-19 diagnostic models to mitigate the impact of lack of testing. We also presented previously unknown COVID-19 clinical variable correlations and clinical subgroups.
最近的 2019 年冠状病毒病(COVID-19)大流行给全球的医疗系统带来了巨大压力,而 COVID-19 检测的严重短缺更是加剧了这种压力。
在这项研究中,我们通过应用机器学习对 151 项已发表研究的 COVID-19 数据进行重新分析,旨在基于患者症状和常规检测结果,提出一种更准确的 COVID-19 诊断模型。我们旨在调查临床变量之间的相关性,将 COVID-19 患者聚类为亚型,并基于临床变量生成一种计算分类模型,用于区分 COVID-19 患者和流感患者。
我们发现了一些临床变量之间的新关联,包括男性与血清淋巴细胞和中性粒细胞水平较高之间的相关性。我们发现,COVID-19 患者可以根据血清免疫细胞水平、性别和报告的症状聚类为亚型。最后,我们训练了一个 XGBoost 模型,以实现对 COVID-19 患者与流感患者的区分,敏感性为 92.5%,特异性为 97.9%。
我们证明了基于大型临床数据集训练的计算方法可以生成更准确的 COVID-19 诊断模型,以减轻检测不足的影响。我们还提出了以前未知的 COVID-19 临床变量相关性和临床亚组。