Public Health Research Institute of Jiangsu Province, Nanjing, China.
Institute of HIV/AIDS/STI Prevention and Control, Jiangsu Provincial Center for Disease Control and Prevention, Nanjing, China.
J Med Internet Res. 2021 Apr 7;23(4):e23948. doi: 10.2196/23948.
Effectively and efficiently diagnosing patients who have COVID-19 with the accurate clinical type of the disease is essential to achieve optimal outcomes for the patients as well as to reduce the risk of overloading the health care system. Currently, severe and nonsevere COVID-19 types are differentiated by only a few features, which do not comprehensively characterize the complicated pathological, physiological, and immunological responses to SARS-CoV-2 infection in the different disease types. In addition, these type-defining features may not be readily testable at the time of diagnosis.
In this study, we aimed to use a machine learning approach to understand COVID-19 more comprehensively, accurately differentiate severe and nonsevere COVID-19 clinical types based on multiple medical features, and provide reliable predictions of the clinical type of the disease.
For this study, we recruited 214 confirmed patients with nonsevere COVID-19 and 148 patients with severe COVID-19. The clinical characteristics (26 features) and laboratory test results (26 features) upon admission were acquired as two input modalities. Exploratory analyses demonstrated that these features differed substantially between two clinical types. Machine learning random forest models based on all the features in each modality as well as on the top 5 features in each modality combined were developed and validated to differentiate COVID-19 clinical types.
Using clinical and laboratory results independently as input, the random forest models achieved >90% and >95% predictive accuracy, respectively. The importance scores of the input features were further evaluated, and the top 5 features from each modality were identified (age, hypertension, cardiovascular disease, gender, and diabetes for the clinical features modality, and dimerized plasmin fragment D, high sensitivity troponin I, absolute neutrophil count, interleukin 6, and lactate dehydrogenase for the laboratory testing modality, in descending order). Using these top 10 multimodal features as the only input instead of all 52 features combined, the random forest model was able to achieve 97% predictive accuracy.
Our findings shed light on how the human body reacts to SARS-CoV-2 infection as a unit and provide insights on effectively evaluating the disease severity of patients with COVID-19 based on more common medical features when gold standard features are not available. We suggest that clinical information can be used as an initial screening tool for self-evaluation and triage, while laboratory test results should be applied when accuracy is the priority.
有效且高效地诊断患有 COVID-19 的患者,并确定其疾病的准确临床类型,对于实现患者的最佳治疗效果以及降低医疗系统过载的风险至关重要。目前,严重和非严重的 COVID-19 类型仅通过少数特征来区分,这些特征不能全面描述不同疾病类型中 SARS-CoV-2 感染的复杂病理、生理和免疫反应。此外,这些定义类型的特征在诊断时可能不容易进行检测。
在这项研究中,我们旨在使用机器学习方法更全面地了解 COVID-19,基于多种医学特征准确区分严重和非严重 COVID-19 临床类型,并提供疾病临床类型的可靠预测。
在这项研究中,我们招募了 214 名确诊的非严重 COVID-19 患者和 148 名严重 COVID-19 患者。入院时采集了临床特征(26 个特征)和实验室检查结果(26 个特征)作为两种输入方式。探索性分析表明,这两种临床类型之间存在显著差异。基于每个模态的所有特征以及每个模态中排名前 5 的特征,开发并验证了基于机器学习的随机森林模型,以区分 COVID-19 临床类型。
使用临床和实验室结果作为独立的输入,随机森林模型的预测准确率分别达到了>90%和>95%。进一步评估了输入特征的重要性得分,并确定了每个模态的前 5 个特征(临床特征模态为年龄、高血压、心血管疾病、性别和糖尿病,实验室检测模态为二聚体纤溶酶片段 D、高敏肌钙蛋白 I、绝对中性粒细胞计数、白细胞介素 6 和乳酸脱氢酶,依次降序排列)。使用这 10 个多模态特征作为唯一输入,而不是组合使用 52 个特征,随机森林模型可以达到 97%的预测准确率。
我们的研究结果揭示了人体对 SARS-CoV-2 感染的反应方式,并为在无法获得金标准特征时,基于更常见的医学特征有效评估 COVID-19 患者的疾病严重程度提供了新的见解。我们建议将临床信息用作自我评估和分诊的初始筛查工具,而在需要准确性时应应用实验室检查结果。