Campagner Andrea, Carobene Anna, Cabitza Federico
DISCo, Università degli Studi di Milano-Bicocca, Milan, Italy.
Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy.
Health Inf Sci Syst. 2021 Oct 23;9(1):37. doi: 10.1007/s13755-021-00167-3. eCollection 2021 Dec.
The rRT-PCR for COVID-19 diagnosis is affected by long turnaround time, potential shortage of reagents, high false-negative rates and high costs. Routine hematochemical tests are a faster and less expensive alternative for diagnosis. Thus, Machine Learning (ML) has been applied to hematological parameters to develop diagnostic tools and help clinicians in promptly managing positive patients. However, few ML models have been externally validated, making their real-world applicability unclear.
We externally validate 6 state-of-the-art diagnostic ML models, based on Complete Blood Count (CBC) and trained on a dataset encompassing 816 COVID-19 positive cases. The external validation was performed based on two datasets, collected at two different hospitals in northern Italy and encompassing 163 and 104 COVID-19 positive cases, in terms of both error rate and calibration.
We report an average AUC of 95% and average Brier score of 0.11, out-performing existing ML methods, and showing good cross-site transportability. The best performing model (SVM) reported an average AUC of 97.5% (Sensitivity: 87.5%, Specificity: 94%), comparable with the performance of RT-PCR, and was also the best calibrated. The validated models can be useful in the early identification of potential COVID-19 patients, due to the rapid availability of CBC exams, and in multiple test settings.
用于新冠病毒疾病(COVID-19)诊断的逆转录实时聚合酶链反应(rRT-PCR)受到周转时间长、试剂可能短缺、高假阴性率和高成本的影响。常规血液生化检查是一种更快且成本更低的诊断替代方法。因此,机器学习(ML)已应用于血液学参数以开发诊断工具,并帮助临床医生及时管理阳性患者。然而,很少有ML模型经过外部验证,其在现实世界中的适用性尚不清楚。
我们基于全血细胞计数(CBC)对6个最先进的诊断ML模型进行了外部验证,这些模型在包含816例COVID-19阳性病例的数据集上进行训练。外部验证基于在意大利北部两家不同医院收集的两个数据集进行,这两个数据集分别包含163例和104例COVID-19阳性病例,评估指标包括错误率和校准情况。
我们报告的平均曲线下面积(AUC)为95%,平均布里尔评分(Brier score)为0.11,优于现有的ML方法,并显示出良好的跨站点可移植性。表现最佳的模型(支持向量机,SVM)报告的平均AUC为97.5%(敏感性:87.5%,特异性:94%),与逆转录聚合酶链反应(RT-PCR)的性能相当,并且校准效果也是最好的。由于CBC检查可快速获得,经过验证的模型在早期识别潜在的COVID-19患者以及在多种检测环境中可能会很有用。