Faculty of Mathematics and Computer Science, University of Warmia and Mazury in Olsztyn, Ul. Oczapowskiego 2, 10-719, Olsztyn, Poland.
Artificial Intelligence Department, Lviv Polytechnic National University, 12 S. Bandery St, Lviv, 79013, Ukraine.
Sci Rep. 2024 Apr 29;14(1):9782. doi: 10.1038/s41598-024-60637-y.
Though COVID-19 is no longer a pandemic but rather an endemic, the epidemiological situation related to the SARS-CoV-2 virus is developing at an alarming rate, impacting every corner of the world. The rapid escalation of the coronavirus has led to the scientific community engagement, continually seeking solutions to ensure the comfort and safety of society. Understanding the joint impact of medical and non-medical interventions on COVID-19 spread is essential for making public health decisions that control the pandemic. This paper introduces two novel hybrid machine-learning ensembles that combine supervised and unsupervised learning for COVID-19 data classification and regression. The study utilizes publicly available COVID-19 outbreak and potential predictive features in the USA dataset, which provides information related to the outbreak of COVID-19 disease in the US, including data from each of 3142 US counties from the beginning of the epidemic (January 2020) until June 2021. The developed hybrid hierarchical classifiers outperform single classification algorithms. The best-achieved performance metrics for the classification task were Accuracy = 0.912, ROC-AUC = 0.916, and F1-score = 0.916. The proposed hybrid hierarchical ensemble combining both supervised and unsupervised learning allows us to increase the accuracy of the regression task by 11% in terms of MSE, 29% in terms of the area under the ROC, and 43% in terms of the MPP metric. Thus, using the proposed approach, it is possible to predict the number of COVID-19 cases and deaths based on demographic, geographic, climatic, traffic, public health, social-distancing-policy adherence, and political characteristics with sufficiently high accuracy. The study reveals that virus pressure is the most important feature in COVID-19 spread for classification and regression analysis. Five other significant features were identified to have the most influence on COVID-19 spread. The combined ensembling approach introduced in this study can help policymakers design prevention and control measures to avoid or minimize public health threats in the future.
虽然 COVID-19 不再是大流行,而是一种地方病,但与 SARS-CoV-2 病毒相关的流行病学形势正在以惊人的速度发展,影响着世界的每一个角落。冠状病毒的迅速升级导致科学界不断寻求解决方案,以确保社会的舒适和安全。了解医疗和非医疗干预措施对 COVID-19 传播的共同影响,对于做出控制大流行的公共卫生决策至关重要。本文介绍了两种新颖的混合机器学习集成,这些集成结合了监督学习和无监督学习,用于 COVID-19 数据分类和回归。该研究利用了美国公共数据集上可用的 COVID-19 爆发和潜在预测特征,这些特征提供了与美国 COVID-19 疾病爆发相关的信息,包括从 2020 年 1 月疫情开始到 2021 年 6 月的每个美国 3142 个县的数据。开发的混合分层分类器优于单一分类算法。分类任务的最佳性能指标为 Accuracy = 0.912、ROC-AUC = 0.916 和 F1-score = 0.916。提出的结合监督学习和无监督学习的混合分层集成允许我们将回归任务的准确性提高 11%(MSE 方面)、29%(ROC 下面积方面)和 43%(MPP 方面)。因此,使用所提出的方法,可以基于人口统计学、地理位置、气候、交通、公共卫生、社会隔离政策遵守情况和政治特征,以足够高的准确性预测 COVID-19 病例和死亡人数。该研究表明,病毒压力是 COVID-19 传播分类和回归分析中最重要的特征。还确定了另外五个对 COVID-19 传播影响最大的重要特征。本研究中引入的组合集成方法可以帮助政策制定者设计预防和控制措施,以避免或最小化未来的公共卫生威胁。