National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research); NHC Key Laboratory of Parasite and Vector Biology; WHO Collaborating Centre for Tropical Diseases; National Center for International Research on Tropical Diseases, Shanghai, 200025, China.
National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases, National Institute of Parasitic Diseases at Chinese Center for Disease Control and Prevention, Chinese Center for Tropical Diseases Research, Shanghai, 200025, China.
BMC Public Health. 2024 Mar 20;24(1):865. doi: 10.1186/s12889-024-17929-9.
Following China's official designation as malaria-free country by WHO, the imported malaria has emerged as a significant determinant impacting the malaria reestablishment within China. The objective of this study is to explore the application prospects of machine learning algorithms in imported malaria risk assessment of China.
The data of imported malaria cases in China from 2011 to 2019 was provided by China CDC; historical epidemic data of malaria endemic country was obtained from World Malaria Report, and the other data used in this study are open access data. All the data processing and model construction based on R, and map visualization used ArcGIS software.
A total of 27,088 malaria cases imported into China from 85 countries between 2011 and 2019. After data preprocessing and classification, clean dataset has 765 rows (85 * 9) and 11 cols. Six machine learning models was constructed based on the training set, and Random Forest model demonstrated the best performance in model evaluation. According to RF, the highest feature importance were the number of malaria deaths and Indigenous malaria cases. The RF model demonstrated high accuracy in forecasting risk for the year 2019, achieving commendable accuracy rate of 95.3%. This result aligns well with the observed outcomes, indicating the model's reliability in predicting risk levels.
Machine learning algorithms have reliable application prospects in risk assessment of imported malaria in China. This study provides a new methodological reference for the risk assessment and control strategies adjusting of imported malaria in China.
在中国被世界卫生组织正式确认为无疟疾国家后,输入性疟疾成为影响中国疟疾重新流行的重要决定因素。本研究旨在探讨机器学习算法在中国输入性疟疾风险评估中的应用前景。
本研究提供了中国疾病预防控制中心 2011 年至 2019 年期间输入性疟疾病例的数据;从《世界疟疾报告》中获取了疟疾流行国家的历史流行数据,本研究中使用的其他数据是公开获取的数据。所有数据处理和模型构建均基于 R 语言进行,地图可视化使用 ArcGIS 软件。
2011 年至 2019 年间,85 个国家共报告输入性疟疾病例 27088 例。经过数据预处理和分类后,干净数据集共有 765 行(85*9)和 11 列。基于训练集构建了六个机器学习模型,随机森林模型在模型评估中表现最佳。根据 RF,最重要的特征是疟疾死亡人数和本土疟疾病例数。RF 模型在预测 2019 年风险方面表现出很高的准确性,达到了 95.3%的出色准确率。这一结果与实际情况相符,表明该模型在预测风险水平方面具有可靠性。
机器学习算法在中国输入性疟疾风险评估中具有可靠的应用前景。本研究为中国输入性疟疾风险评估和调整提供了新的方法学参考。