FinanceIT Research Group, University of New South Wales, Sydney, NSW, Australia.
FinanceIT Research Group, University of New South Wales, Sydney, NSW, Australia.
Comput Biol Med. 2024 Feb;169:107876. doi: 10.1016/j.compbiomed.2023.107876. Epub 2023 Dec 24.
In order to prevent and control the increasing number of serious epidemics, the ability to predict the risk caused by emerging outbreaks is essential. However, most current risk prediction tools, except EPIRISK, are limited by being designed for targeting only one specific disease and one country. Differences between countries and diseases (e.g., different economic conditions, different modes of transmission, etc.) pose challenges for building models with cross-country and cross-disease prediction capabilities. The limitation of universality affects domestic and international efforts to control and prevent pandemic outbreaks. To address this problem, we used outbreak data from 43 diseases in 206 countries to develop a universal risk prediction system that can be used across countries and diseases. This system used five machine learning models (including Neural Network XGBoost, Logistic Boost, Random Forest and Kernel SVM) to predict and vote together to make ensemble predictions. It can make predictions with around 80%-90 % accuracy from economic, cultural, social, and epidemiological factors. Three different datasets were designed to test the performance of ML models under different realistic situations. This prediction system has strong predictive ability, adaptability, and generality. It can give universal outbreak risk assessment that are not limited by border or disease type, facilitate rapid response to pandemic outbreaks, government decision-making and international cooperation.
为了预防和控制日益增多的严重传染病,预测新出现的传染病风险的能力至关重要。然而,除了 EPIRISK 之外,目前大多数风险预测工具都受到限制,因为它们是为针对特定疾病和特定国家而设计的。国家和疾病之间的差异(例如,不同的经济条件、不同的传播模式等)给具有跨国和跨疾病预测能力的模型的建立带来了挑战。通用性的局限性影响了国内外控制和预防大流行爆发的努力。为了解决这个问题,我们使用了来自 206 个国家的 43 种疾病的爆发数据,开发了一个可以在国家和疾病之间通用的风险预测系统。该系统使用了五个机器学习模型(包括神经网络 XGBoost、逻辑提升、随机森林和核 SVM)来预测并投票共同进行集成预测。它可以根据经济、文化、社会和流行病学因素,以约 80%-90%的准确率进行预测。设计了三个不同的数据集来测试 ML 模型在不同现实情况下的性能。该预测系统具有强大的预测能力、适应性和通用性。它可以提供不受边界或疾病类型限制的通用爆发风险评估,促进对大流行爆发的快速反应、政府决策和国际合作。