• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于处理不均衡的COVID-19全血细胞计数数据集的机器学习技术比较。

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets.

作者信息

Dorn Marcio, Grisci Bruno Iochins, Narloch Pedro Henrique, Feltes Bruno César, Avila Eduardo, Kahmann Alessandro, Alho Clarice Sampaio

机构信息

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

Center of Biotechnology, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

出版信息

PeerJ Comput Sci. 2021 Aug 12;7:e670. doi: 10.7717/peerj-cs.670. eCollection 2021.

DOI:10.7717/peerj-cs.670
PMID:34458574
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8372002/
Abstract

The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil's case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.

摘要

由新型严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引发的冠状病毒大流行对人类健康和经济产生了重大影响,尤其是在那些在医学检测和治疗方面缺乏资金的国家,比如巴西,该国是受疫情影响第三严重的国家。在这种情况下,机器学习技术被大量用于分析不同类型的医学数据,并辅助决策,提供了一种低成本的选择。由于抗击疫情的紧迫性,大量研究工作正在将机器学习方法应用于临床数据,包括全血细胞计数(CBC)检测,这是最广泛开展的医学检测项目之一。在这项工作中,我们回顾了针对CBC数据最常用的机器学习分类器,以及用于处理类别不平衡问题的流行采样方法。此外,我们描述并批判性地分析了三个公开可用的巴西新冠肺炎CBC数据集,并评估了八个分类器和五种采样技术在选定数据集上的性能。我们的工作展示了哪种分类器和采样方法在不同相关指标上能提供最佳结果,并讨论了它们对未来分析的影响。这些指标和算法的介绍方式有助于该领域的新手。最后,这里讨论的全景图可以显著有助于比较新的机器学习算法的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51d5/8372002/2d11e20fa7ae/peerj-cs-07-670-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51d5/8372002/6a9772a169bf/peerj-cs-07-670-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51d5/8372002/2d11e20fa7ae/peerj-cs-07-670-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51d5/8372002/6a9772a169bf/peerj-cs-07-670-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51d5/8372002/2d11e20fa7ae/peerj-cs-07-670-g003.jpg

相似文献

1
Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets.用于处理不均衡的COVID-19全血细胞计数数据集的机器学习技术比较。
PeerJ Comput Sci. 2021 Aug 12;7:e670. doi: 10.7717/peerj-cs.670. eCollection 2021.
2
QCovSML: A reliable COVID-19 detection system using CBC biomarkers by a stacking machine learning model.QCovSML:一种通过堆叠机器学习模型使用全血细胞计数生物标志物的可靠的新冠病毒检测系统。
Comput Biol Med. 2022 Apr;143:105284. doi: 10.1016/j.compbiomed.2022.105284. Epub 2022 Feb 12.
3
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
4
Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review.生物数据挖掘和机器学习技术在检测和诊断新型冠状病毒 (COVID-19) 中的作用:系统评价。
J Med Syst. 2020 May 25;44(7):122. doi: 10.1007/s10916-020-01582-x.
5
Comparing the performance of meta-classifiers-a case study on selected imbalanced data sets relevant for prediction of liver toxicity.比较元分类器的性能——以与预测肝毒性相关的选定不平衡数据集为例的研究。
J Comput Aided Mol Des. 2018 May;32(5):583-590. doi: 10.1007/s10822-018-0116-z. Epub 2018 Apr 6.
6
Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model.通过结合改进的大趋势扩散和装袋极限学习机模型的新型混合采样,改进不平衡医学数据集的支持向量机分类。
Math Biosci Eng. 2023 Sep 15;20(10):17672-17701. doi: 10.3934/mbe.2023786.
7
Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests.基于常规血液检测的 COVID-19 检测机器学习模型的开发、评估和验证。
Clin Chem Lab Med. 2020 Oct 21;59(2):421-431. doi: 10.1515/cclm-2020-1294.
8
GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning.调整决策阈值以处理机器学习中的不平衡数据。
J Chem Inf Model. 2021 Jun 28;61(6):2623-2640. doi: 10.1021/acs.jcim.1c00160. Epub 2021 Jun 8.
9
Conversion of adverse data corpus to shrewd output using sampling metrics.使用抽样指标将不良数据语料库转换为精准输出。
Vis Comput Ind Biomed Art. 2020 Aug 11;3(1):19. doi: 10.1186/s42492-020-00055-9.
10
Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence.基于机器学习的分类器在 COVID-19 严重程度预测血浆蛋白质组学中的基准测试:通过可解释的人工智能。
Artif Intell Med. 2023 Mar;137:102490. doi: 10.1016/j.artmed.2023.102490. Epub 2023 Jan 18.

引用本文的文献

1
AI-derived CT biomarker score for robust COVID-19 mortality prediction across multiple waves and regions using machine learning.基于人工智能的CT生物标志物评分,通过机器学习对多波疫情和多个地区的新冠病毒肺炎死亡情况进行可靠预测。
Sci Rep. 2025 Aug 6;15(1):28727. doi: 10.1038/s41598-025-14667-9.
2
Comparative performance of twelve machine learning models in predicting COVID-19 mortality risk in children: a population-based retrospective cohort study in Brazil.十二种机器学习模型预测儿童新冠病毒疾病死亡率风险的比较性能:巴西一项基于人群的回顾性队列研究
PeerJ Comput Sci. 2025 May 28;11:e2916. doi: 10.7717/peerj-cs.2916. eCollection 2025.
3

本文引用的文献

1
Charting the challenges behind the testing of COVID-19 in developing countries: Nepal as a case study.剖析发展中国家新冠病毒检测背后的挑战:以尼泊尔为例
Biosaf Health. 2020 Jun;2(2):53-56. doi: 10.1016/j.bsheal.2020.05.002. Epub 2020 May 13.
2
Covid-19 rapid test by combining a Random Forest-based web system and blood tests.基于随机森林的网络系统与血液检测相结合的新冠病毒快速检测方法。
J Biomol Struct Dyn. 2022;40(22):11948-11967. doi: 10.1080/07391102.2021.1966509. Epub 2021 Aug 31.
3
Benchmarking and Testing Machine Learning Approaches with BARRA:CuRDa, a for Cancer Research.
COVID-19 health data prediction: a critical evaluation of CNN-based approaches.
COVID-19健康数据预测:基于卷积神经网络方法的批判性评估
Sci Rep. 2025 Mar 17;15(1):9121. doi: 10.1038/s41598-025-92464-0.
4
Machine learning approaches to predict the need for intensive care unit admission among Iranian COVID-19 patients based on ICD-10: A cross-sectional study.基于国际疾病分类第10版(ICD-10),采用机器学习方法预测伊朗新冠肺炎患者重症监护病房入院需求的横断面研究。
Health Sci Rep. 2024 Sep 2;7(9):e70041. doi: 10.1002/hsr2.70041. eCollection 2024 Sep.
5
Comparing machine learning algorithms to predict COVID‑19 mortality using a dataset including chest computed tomography severity score data.比较机器学习算法,使用包含胸部计算机断层扫描严重程度评分数据的数据集来预测 COVID-19 死亡率。
Sci Rep. 2023 Jul 13;13(1):11343. doi: 10.1038/s41598-023-38133-6.
6
Novel Biomarker Prediction for Lung Cancer Using Random Forest Classifiers.使用随机森林分类器对肺癌进行新型生物标志物预测。
Cancer Inform. 2023 Apr 21;22:11769351231167992. doi: 10.1177/11769351231167992. eCollection 2023.
J Comput Biol. 2021 Sep;28(9):931-944. doi: 10.1089/cmb.2020.0463. Epub 2021 Jul 14.
4
Machine learning for clinical trials in the era of COVID-19.新冠疫情时代用于临床试验的机器学习
Stat Biopharm Res. 2020 Aug 18;12(4):506-517. doi: 10.1080/19466315.2020.1797867.
5
Explaining machine learning based diagnosis of COVID-19 from routine blood tests with decision trees and criteria graphs.用决策树和准则图解释基于常规血液检测的 COVID-19 机器学习诊断。
Comput Biol Med. 2021 May;132:104335. doi: 10.1016/j.compbiomed.2021.104335. Epub 2021 Mar 16.
6
In-depth analysis of laboratory parameters reveals the interplay between sex, age, and systemic inflammation in individuals with COVID-19.深入分析实验室参数揭示了 COVID-19 患者中性别、年龄和全身炎症之间的相互作用。
Int J Infect Dis. 2021 Apr;105:579-587. doi: 10.1016/j.ijid.2021.03.016. Epub 2021 Mar 10.
7
Covid-19 Automated Diagnosis and Risk Assessment through Metabolomics and Machine Learning.通过代谢组学和机器学习进行新冠病毒自动诊断和风险评估。
Anal Chem. 2021 Feb 2;93(4):2471-2479. doi: 10.1021/acs.analchem.0c04497. Epub 2021 Jan 20.
8
Multi-Approach Bioinformatics Analysis of Curated Omics Data Provides a Gene Expression Panorama for Multiple Cancer Types.对经过整理的组学数据进行多方法生物信息学分析,可为多种癌症类型提供基因表达全景图。
Front Genet. 2020 Nov 23;11:586602. doi: 10.3389/fgene.2020.586602. eCollection 2020.
9
Detecting COVID-19 patients based on fuzzy inference engine and Deep Neural Network.基于模糊推理引擎和深度神经网络检测新冠肺炎患者。
Appl Soft Comput. 2021 Feb;99:106906. doi: 10.1016/j.asoc.2020.106906. Epub 2020 Nov 12.
10
Ensemble learning model for diagnosing COVID-19 from routine blood tests.基于常规血液检测诊断新冠肺炎的集成学习模型
Inform Med Unlocked. 2020;21:100449. doi: 10.1016/j.imu.2020.100449. Epub 2020 Oct 20.