用于处理不均衡的COVID-19全血细胞计数数据集的机器学习技术比较。

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets.

作者信息

Dorn Marcio, Grisci Bruno Iochins, Narloch Pedro Henrique, Feltes Bruno César, Avila Eduardo, Kahmann Alessandro, Alho Clarice Sampaio

机构信息

Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

Center of Biotechnology, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.

出版信息

PeerJ Comput Sci. 2021 Aug 12;7:e670. doi: 10.7717/peerj-cs.670. eCollection 2021.

DOI:10.7717/peerj-cs.670

PMID:34458574

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8372002/

Abstract

The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil's case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.

摘要

由新型严重急性呼吸综合征冠状病毒2（SARS-CoV-2）引发的冠状病毒大流行对人类健康和经济产生了重大影响，尤其是在那些在医学检测和治疗方面缺乏资金的国家，比如巴西，该国是受疫情影响第三严重的国家。在这种情况下，机器学习技术被大量用于分析不同类型的医学数据，并辅助决策，提供了一种低成本的选择。由于抗击疫情的紧迫性，大量研究工作正在将机器学习方法应用于临床数据，包括全血细胞计数（CBC）检测，这是最广泛开展的医学检测项目之一。在这项工作中，我们回顾了针对CBC数据最常用的机器学习分类器，以及用于处理类别不平衡问题的流行采样方法。此外，我们描述并批判性地分析了三个公开可用的巴西新冠肺炎CBC数据集，并评估了八个分类器和五种采样技术在选定数据集上的性能。我们的工作展示了哪种分类器和采样方法在不同相关指标上能提供最佳结果，并讨论了它们对未来分析的影响。这些指标和算法的介绍方式有助于该领域的新手。最后，这里讨论的全景图可以显著有助于比较新的机器学习算法的结果。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于处理不均衡的COVID-19全血细胞计数数据集的机器学习技术比较。

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

用于处理不均衡的COVID-19全血细胞计数数据集的机器学习技术比较。

Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献