Suppr超能文献

使用机器学习预测新冠病毒疾病病例的可行性。

The feasibility of using machine learning to predict COVID-19 cases.

作者信息

Chen Shan, Ding Yuanzhao

机构信息

Science of Learning in Education Centre, National Institute of Education, Nanyang Technological University, 637616, Singapore.

School of Geography and the Environment, University of Oxford, South Parks Road, Oxford OX1 3QY, United Kingdom.

出版信息

Int J Med Inform. 2025 Apr;196:105786. doi: 10.1016/j.ijmedinf.2025.105786. Epub 2025 Jan 23.

Abstract

BACKGROUND

Coronavirus Disease 2019 (COVID-19), caused by the SARS-CoV-2 virus, emerged as a global health crisis in 2019, resulting in widespread morbidity and mortality. A persistent challenge during the pandemic has been the accuracy of reported epidemic data, particularly in underdeveloped regions with limited access to COVID-19 test kits and healthcare infrastructure. In the post-COVID era, this issue remains crucial. This study introduces a novel approach by leveraging machine learning to predict cases and uncover critical discrepancies, focusing on African regions where reported daily cases per million often deviate significantly from machine learning-predicted cases. These findings strongly suggest widespread underreporting of cases. By identifying these gaps, our research provides valuable insights for future pandemic preparedness, improving epidemic forecasting accuracy, data reliability, and response strategies to mitigate the impact of emerging global health crises.

OBJECTIVE

This study aims to assess the reliability of reported COVID-19 incidence data globally, particularly in underdeveloped regions, and to identify discrepancies between reported and predicted cases using machine learning methodologies.

METHODS

Data collected from March 2020 to September 2022 included demographic, healthcare, economic, and testing-related parameters. Several machine learning models-neural networks, decision trees, random forests, cross-validation, support vector machines, and logistic regression-were employed to predict COVID-19 incidence rates. Model performance was evaluated using testing accuracy metrics.

RESULTS

Testing accuracy rates for the models were as follows: neural networks (65.50 %), decision trees (63.76 %), random forests (63.33 %), cross-validation (55.92 %), support vector machines (63.62 %), and logistic regression (64.70 %). Comparative analysis using neural networks revealed significant discrepancies between reported and predicted COVID-19 cases, particularly in numerous African countries. These results suggest a considerable volume of underreported cases in regions with limited testing capabilities.

CONCLUSION

This study highlights the critical need for improved data accuracy and reporting mechanisms, especially in resource-constrained regions. International organizations and policymakers must implement strategies to enhance testing capacity and data reliability to better understand and manage the global impact of the pandemic. Our work emphasizes the potential of machine learning to identify gaps in epidemic reporting, facilitating evidence-based interventions.

摘要

背景

2019年冠状病毒病(COVID-19)由严重急性呼吸综合征冠状病毒2(SARS-CoV-2)引起,于2019年成为全球健康危机,导致广泛的发病和死亡。疫情期间持续存在的一个挑战是报告的疫情数据的准确性,特别是在难以获得COVID-19检测试剂盒和医疗基础设施有限的欠发达地区。在COVID后时代,这个问题仍然至关重要。本研究引入了一种新方法,即利用机器学习来预测病例并发现关键差异,重点关注非洲地区,那里每百万人口的每日报告病例数往往与机器学习预测的病例数有显著偏差。这些发现强烈表明病例报告存在广泛漏报。通过识别这些差距,我们的研究为未来的疫情防范、提高疫情预测准确性、数据可靠性以及减轻新兴全球健康危机影响的应对策略提供了有价值的见解。

目的

本研究旨在评估全球报告的COVID-19发病率数据的可靠性,特别是在欠发达地区,并使用机器学习方法识别报告病例与预测病例之间的差异。

方法

收集的2020年3月至2022年9月的数据包括人口、医疗、经济和检测相关参数。使用了几种机器学习模型——神经网络、决策树、随机森林、交叉验证、支持向量机和逻辑回归——来预测COVID-19发病率。使用测试准确率指标评估模型性能。

结果

模型的测试准确率如下:神经网络(65.50%)、决策树(63.76%)、随机森林(63.33%)、交叉验证(55.92%)、支持向量机(63.62%)和逻辑回归(64.70%)。使用神经网络进行的比较分析显示,报告的COVID-19病例与预测病例之间存在显著差异,特别是在许多非洲国家。这些结果表明,检测能力有限的地区存在大量漏报病例。

结论

本研究强调了提高数据准确性和报告机制的迫切需求,特别是在资源有限的地区。国际组织和政策制定者必须实施战略,以提高检测能力和数据可靠性,以便更好地理解和管理疫情的全球影响。我们的工作强调了机器学习在识别疫情报告差距方面的潜力,有助于进行基于证据的干预。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验