• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于常规血液检测诊断新冠肺炎的集成学习模型

Ensemble learning model for diagnosing COVID-19 from routine blood tests.

作者信息

AlJame Maryam, Ahmad Imtiaz, Imtiaz Ayyub, Mohammed Ameer

机构信息

Computer Engineering Department, Kuwait University, Kuwait.

College of Medicine, Kuwait University, Kuwait.

出版信息

Inform Med Unlocked. 2020;21:100449. doi: 10.1016/j.imu.2020.100449. Epub 2020 Oct 20.

DOI:10.1016/j.imu.2020.100449
PMID:33102686
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7572278/
Abstract

BACKGROUND AND OBJECTIVES

The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests.

METHOD

The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique.

RESULTS

The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6-100], AUC of 99.38% [95% CI: 97.5-100], a sensitivity of 98.72% [95% CI: 94.6-100] and a specificity of 99.99% [95% CI: 99.99-100].

DISCUSSION

The proposed model revealed better performance when compared against existing state-of-the-art studies (Banerjee et al., 2020; de Freitas Barbosa et al., 2020; de Moraes Batista et al., 2020; Soares et al., 2020) [3,22,56,71] for the same set of features employed by them. As compared to the best performing Bayes Net model (de Freitas Barbosa et al., 2020) [22] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model (de Moraes Batista et al., 2020) [56], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model (Soares et al., 2020) [71] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained a considerably higher score as compared with ANN model (Banerjee et al., 2020) [3] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/5b0465029b98/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/2f5c84b95a32/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/4b4b1d77ef9d/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/0e9e2340a904/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/97a86d90d85e/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/5b0465029b98/gr5_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/2f5c84b95a32/gr1_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/4b4b1d77ef9d/gr2_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/0e9e2340a904/gr3_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/97a86d90d85e/gr4_lrg.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/271e/7572278/5b0465029b98/gr5_lrg.jpg
摘要

背景与目的

2019年新型冠状病毒病(COVID-19)大流行给人类社会带来了严重影响,全球死亡人数众多。迫切需要对COVID-19患者进行早期且可靠的筛查,以便提供更好、更及时的患者护理并抗击疾病传播。在此背景下,近期研究报告了使用常规血液检测对COVID-19患者进行初步筛查的一些关键优势。在本文中,首先我们对利用常规实验室和/或临床数据进行COVID-19诊断的新兴技术进行综述。然后,我们提出了ERLX,这是一种基于常规血液检测进行COVID-19诊断的集成学习模型。

方法

所提出的模型在第一层级使用了三个知名的不同分类器,即极端随机树、随机森林和逻辑回归,它们具有不同的架构和学习特性,然后通过使用第二层级的极端梯度提升(XGBoost)分类器来组合它们的预测结果,以实现更好的性能。对于数据准备,所提出的方法采用KNNImputer算法处理数据集中的缺失值,使用隔离森林(iForest)去除异常数据,并采用合成少数过采样技术(SMOTE)来平衡数据分布。对于模型可解释性,通过使用SHapley Additive exPlanations(SHAP)技术报告特征重要性。

结果

所提出的模型使用来自巴西阿尔伯特·爱因斯坦医院的公开可用数据集进行训练和评估,该数据集包含5644个数据样本,其中有559例确诊的COVID-19病例。该集成模型表现出色,总体准确率为99.88%[95%置信区间:99.6 - 100],曲线下面积(AUC)为99.38%[95%置信区间:97.5 - 100],灵敏度为98.72%[95%置信区间:94.6 - 100],特异性为99.99%[95%置信区间:99.99 - 100]。

讨论

与现有最先进的研究(Banerjee等人,2020年;de Freitas Barbosa等人,2020年;de Moraes Batista等人,2020年;Soares等人,2020年)[3,22,56,71]针对相同特征集进行比较时,所提出的模型表现出更好的性能。与表现最佳的贝叶斯网络模型(de Freitas Barbosa等人,2020年)[22]平均准确率95.159%相比,ERLX的平均准确率达到了99.94%。与支持向量机模型(de Moraes Batista等人,2020年)[56]报告的AUC为85%相比,ERLX除了在灵敏度和特异性方面有所提高外,还获得了99.77%的AUC。与ER-COV模型(Soares等人,2020年)[71]平均灵敏度70.25%和特异性85.98%相比,ERLX模型的灵敏度为99.47%,特异性为99.99%。在所有性能指标方面,ERLX模型与人工神经网络模型(Banerjee等人,2020年)[3]相比得分要高得多。因此,所提出的模型是稳健的,可用于对COVID-19患者进行可靠的早期快速筛查。

相似文献

1
Ensemble learning model for diagnosing COVID-19 from routine blood tests.基于常规血液检测诊断新冠肺炎的集成学习模型
Inform Med Unlocked. 2020;21:100449. doi: 10.1016/j.imu.2020.100449. Epub 2020 Oct 20.
2
Artificial intelligence in clinical care amidst COVID-19 pandemic: A systematic review.COVID-19大流行期间临床护理中的人工智能:一项系统综述。
Comput Struct Biotechnol J. 2021;19:2833-2850. doi: 10.1016/j.csbj.2021.05.010. Epub 2021 May 7.
3
Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略:以脑出血为例。
BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.
4
Explainable artificial intelligence model for identifying COVID-19 gene biomarkers.用于识别 COVID-19 基因生物标志物的可解释人工智能模型。
Comput Biol Med. 2023 Mar;154:106619. doi: 10.1016/j.compbiomed.2023.106619. Epub 2023 Feb 1.
5
Deep forest model for diagnosing COVID-19 from routine blood tests.基于深度森林模型的常规血检用于 COVID-19 诊断
Sci Rep. 2021 Aug 17;11(1):16682. doi: 10.1038/s41598-021-95957-w.
6
A hybrid Stacking-SMOTE model for optimizing the prediction of autistic genes.一种混合的堆叠-SMOTE 模型,用于优化自闭症基因预测。
BMC Bioinformatics. 2023 Oct 6;24(1):379. doi: 10.1186/s12859-023-05501-y.
7
XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease.基于 XGBoost-SHAP 的阿尔茨海默病可解释诊断框架。
BMC Med Inform Decis Mak. 2023 Jul 25;23(1):137. doi: 10.1186/s12911-023-02238-9.
8
Explaining machine learning based diagnosis of COVID-19 from routine blood tests with decision trees and criteria graphs.用决策树和准则图解释基于常规血液检测的 COVID-19 机器学习诊断。
Comput Biol Med. 2021 May;132:104335. doi: 10.1016/j.compbiomed.2021.104335. Epub 2021 Mar 16.
9
Diagnosing Coronary Artery Disease on the Basis of Hard Ensemble Voting Optimization.基于硬集合投票优化的冠状动脉疾病诊断。
Medicina (Kaunas). 2022 Nov 28;58(12):1745. doi: 10.3390/medicina58121745.
10
Detecting the most critical clinical variables of COVID-19 breakthrough infection in vaccinated persons using machine learning.使用机器学习检测接种疫苗者中新冠病毒突破性感染的最关键临床变量。
Digit Health. 2023 Nov 5;9:20552076231207593. doi: 10.1177/20552076231207593. eCollection 2023 Jan-Dec.

引用本文的文献

1
Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records.使用电子健康记录的集成学习以提高青光眼患者的准确识别
JAMIA Open. 2025 Aug 10;8(4):ooaf080. doi: 10.1093/jamiaopen/ooaf080. eCollection 2025 Aug.
2
XGBMUT: Predicting the Functional Impact of Missense Mutations Using an Extreme Gradient Boost Classifier.XGBMUT:使用极端梯度提升分类器预测错义突变的功能影响。
ACS Omega. 2025 Feb 19;10(8):8349-8360. doi: 10.1021/acsomega.4c10179. eCollection 2025 Mar 4.
3
Smart medical report: efficient detection of common and rare diseases on common blood tests.

本文引用的文献

1
Leveraging Data Science to Combat COVID-19: A Comprehensive Review.利用数据科学抗击新冠疫情:全面综述
IEEE Trans Artif Intell. 2020 Sep 2;1(1):85-103. doi: 10.1109/TAI.2020.3020521. eCollection 2020 Aug.
2
COVID-19 Control by Computer Vision Approaches: A Survey.基于计算机视觉方法的COVID-19防控:一项综述。
IEEE Access. 2020 Sep 29;8:179437-179456. doi: 10.1109/ACCESS.2020.3027685. eCollection 2020.
3
COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis.
智能医学报告:在常规血液检测中高效检测常见和罕见疾病。
Front Digit Health. 2024 Dec 5;6:1505483. doi: 10.3389/fdgth.2024.1505483. eCollection 2024.
4
A Genetic algorithm aided hyper parameter optimization based ensemble model for respiratory disease prediction with Explainable AI.一种基于遗传算法辅助超参数优化的集成模型,用于借助可解释人工智能进行呼吸系统疾病预测。
PLoS One. 2024 Dec 2;19(12):e0308015. doi: 10.1371/journal.pone.0308015. eCollection 2024.
5
Towards Improved XAI-Based Epidemiological Research into the Next Potential Pandemic.迈向基于可解释人工智能的流行病学研究,以应对下一次潜在的大流行。
Life (Basel). 2024 Jun 21;14(7):783. doi: 10.3390/life14070783.
6
Algorithms for predicting COVID outcome using ready-to-use laboratorial and clinical data.使用即用型实验室和临床数据预测 COVID 结局的算法。
Front Public Health. 2024 May 14;12:1347334. doi: 10.3389/fpubh.2024.1347334. eCollection 2024.
7
iCovidCare: Intelligent health monitoring framework for COVID-19 using ensemble random forest in edge networks.iCovidCare:基于边缘网络中的集成随机森林的新冠肺炎智能健康监测框架。
Internet Things (Amst). 2021 Jun;14:100385. doi: 10.1016/j.iot.2021.100385. Epub 2021 Mar 10.
8
Prediction of atrial fibrillation and stroke using machine learning models in UK Biobank.在英国生物银行中使用机器学习模型预测心房颤动和中风。
Heliyon. 2024 Mar 17;10(7):e28034. doi: 10.1016/j.heliyon.2024.e28034. eCollection 2024 Apr 15.
9
MENet: A Mitscherlich function based ensemble of CNN models to classify lung cancer using CT scans.MENet:一种基于米氏函数的卷积神经网络模型集成,用于通过CT扫描对肺癌进行分类。
PLoS One. 2024 Mar 11;19(3):e0298527. doi: 10.1371/journal.pone.0298527. eCollection 2024.
10
A brief review and scientometric analysis on ensemble learning methods for handling COVID-19.关于处理新冠肺炎的集成学习方法的简要综述与科学计量分析
Heliyon. 2024 Feb 20;10(4):e26694. doi: 10.1016/j.heliyon.2024.e26694. eCollection 2024 Feb 29.
新冠疫情与计算机听觉:语音和声音分析在新冠病毒疫情中所能发挥作用的概述
Front Digit Health. 2021 Mar 29;3:564906. doi: 10.3389/fdgth.2021.564906. eCollection 2021.
4
COVID-19 diagnosis by routine blood tests using machine learning.利用机器学习通过常规血液检测诊断 COVID-19。
Sci Rep. 2021 May 24;11(1):10738. doi: 10.1038/s41598-021-90265-9.
5
Predicting outcomes of COVID-19 from admission biomarkers: a prospective UK cohort study.从入院生物标志物预测 COVID-19 结局:一项英国前瞻性队列研究。
Emerg Med J. 2021 Jul;38(7):543-548. doi: 10.1136/emermed-2020-210380. Epub 2021 May 21.
6
A novel artificial intelligence-assisted triage tool to aid in the diagnosis of suspected COVID-19 pneumonia cases in fever clinics.一种新型人工智能辅助分诊工具,用于协助发热门诊诊断疑似新型冠状病毒肺炎病例。
Ann Transl Med. 2021 Feb;9(3):201. doi: 10.21037/atm-20-3073.
7
Development of machine learning models to predict RT-PCR results for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in patients with influenza-like symptoms using only basic clinical data.利用仅有的基础临床数据开发机器学习模型,以预测流感样症状患者的严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)实时聚合酶链反应(RT-PCR)结果。
Scand J Trauma Resusc Emerg Med. 2020 Dec 1;28(1):113. doi: 10.1186/s13049-020-00808-8.
8
Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis.使用临床数据的机器学习诊断 COVID-19:系统评价和荟萃分析。
BMC Med Inform Decis Mak. 2020 Sep 29;20(1):247. doi: 10.1186/s12911-020-01266-z.
9
Clinical Predictive Models for COVID-19: Systematic Study.新型冠状病毒肺炎的临床预测模型:系统研究
J Med Internet Res. 2020 Oct 6;22(10):e21439. doi: 10.2196/21439.
10
AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app.AI4COVID-19:通过一款应用程序,利用人工智能从咳嗽样本中对新冠病毒进行初步诊断。
Inform Med Unlocked. 2020;20:100378. doi: 10.1016/j.imu.2020.100378. Epub 2020 Jun 26.