Interdisciplinary Stem Cells and Regenerative Medicine, Ankara University Stem Cell Institute, Ankara, Turkey.
Departments of Medical Biochemistry and Clinical Microbiology, Başkent University Faculty of Medicine, Ankara, Turkey.
Am J Clin Pathol. 2022 May 4;157(5):758-766. doi: 10.1093/ajcp/aqab187.
The present study aimed to develop a clinical decision support tool to assist coronavirus disease 2019 (COVID-19) diagnoses with machine learning (ML) models using routine laboratory test results.
We developed ML models using laboratory data (n = 1,391) composed of six clinical chemistry (CC) results, 14 CBC parameter results, and results of a severe acute respiratory syndrome coronavirus 2 real-time reverse transcription-polymerase chain reaction as a gold standard method. Four ML algorithms, including random forest (RF), gradient boosting (XGBoost), support vector machine (SVM), and logistic regression, were used to build eight ML models using CBC and a combination of CC and CBC parameters. Performance evaluation was conducted on the test data set and external validation data set from Brazil.
The accuracy values of all models ranged from 74% to 91%. The RF model trained from CC and CBC analytes showed the best performance on the present study's data set (accuracy, 85.3%; sensitivity, 79.6%; specificity, 91.2%). The RF model trained from only CBC parameters detected COVID-19 cases with 82.8% accuracy. The best performance on the external validation data set belonged to the SVM model trained from CC and CBC parameters (accuracy, 91.18%; sensitivity, 100%; specificity, 84.21%).
ML models presented in this study can be used as clinical decision support tools to contribute to physicians' clinical judgment for COVID-19 diagnoses.
本研究旨在开发一种临床决策支持工具,利用机器学习 (ML) 模型和常规实验室检测结果辅助 2019 年冠状病毒病 (COVID-19) 的诊断。
我们使用包含 6 项临床化学 (CC) 结果、14 项全血细胞计数 (CBC) 参数结果和严重急性呼吸综合征冠状病毒 2 实时逆转录-聚合酶链反应结果的实验室数据(n=1391)开发了 ML 模型,后者作为金标准方法。我们使用了包括随机森林 (RF)、梯度提升 (XGBoost)、支持向量机 (SVM) 和逻辑回归在内的 4 种 ML 算法,使用 CBC 和 CC 与 CBC 参数组合构建了 8 种 ML 模型。我们在测试数据集和来自巴西的外部验证数据集上进行了性能评估。
所有模型的准确率值在 74%至 91%之间。基于 CC 和 CBC 分析物训练的 RF 模型在本研究的数据集中表现出最佳性能(准确率为 85.3%,敏感度为 79.6%,特异性为 91.2%)。仅基于 CBC 参数训练的 RF 模型可检测 COVID-19 病例,准确率为 82.8%。基于 CC 和 CBC 参数训练的 SVM 模型在外部验证数据集中的表现最佳(准确率为 91.18%,敏感度为 100%,特异性为 84.21%)。
本研究提出的 ML 模型可用作临床决策支持工具,有助于医生对 COVID-19 的诊断进行临床判断。