School of Information and Communication Engineering, North University of China, Taiyuan, 030051, China.
School of Mathematics, North University of China, Taiyuan, 030051, China.
Sci Rep. 2024 Nov 19;14(1):28659. doi: 10.1038/s41598-024-80058-1.
Infectious diseases are a global public health problem that poses a threat to human society. Since the 1970s, constantly mutated new infectious viruses have been quietly attacking humanity, and at least one new type of infectious disease is discovered every year. Therefore, early warning of infectious diseases will greatly reduce the socio-economic harm of infectious diseases. This study is based on the data of COVID-19 epidemic in China (except Macau and Taiwan Province) from 2020 to 2022. Firstly, we used ArcGIS software to analyze the spatial agglomeration pattern of the number of patients in various regions of China through global spatial autocorrelation analysis, local spatial autocorrelation analysis, center of gravity trajectory migration algorithm and other statistical tools; In addition, the areas with serious COVID-19 epidemic and requiring special attention were screened out. Then, autoregressive integrated moving average model (ARIMA), extreme learning machine (ELM), support vector regression (SVR), wavelet neural network (Wavelet), recurrent neural network (RNN) and long short-term memory (LSTM) were used to predict COVID-19 epidemic data in Guangdong Province, China; And the prediction performance of each model was compared through prediction accuracy indicators. Finally, a multi algorithm fusion learning model based on stacking technology is proposed to address the problem of poor generalization ability of single algorithm models in prediction; Furthermore, radial basis function network (RBF) was used as a two-level meta learner to fuse the above models, and particle swarm optimization (PSO) was used to optimize RBF parameters to reduce generalization error. The experimental results show that the performance of the integrated model is better than that of the single model in the COVID-19 dataset. In order to better apply the stacking model to the prediction of new infectious diseases, we applied the prediction model based on the COVID-19 dataset to the prediction of the number of AIDS and pulmonary tuberculosis (PTB) cases, and verified the wide applicability of our model in the prediction of infectious diseases.
传染病是全球公共卫生问题,对人类社会构成威胁。自 20 世纪 70 年代以来,不断变异的新型传染性病毒一直在悄然袭击人类,每年至少发现一种新型传染病。因此,对传染病进行早期预警将大大降低传染病的社会经济危害。本研究基于 2020-2022 年中国(不含澳门和台湾地区)新冠肺炎疫情数据,首先利用 ArcGIS 软件通过全局空间自相关分析、局部空间自相关分析、重心轨迹迁移算法等统计工具分析中国各地区患者数量的空间集聚格局;其次,筛选出新冠肺炎疫情严重、需要特别关注的地区。然后,采用自回归积分移动平均模型(ARIMA)、极限学习机(ELM)、支持向量回归(SVR)、小波神经网络(Wavelet)、递归神经网络(RNN)和长短时记忆(LSTM)对中国广东省新冠肺炎疫情数据进行预测,并通过预测精度指标比较各模型的预测性能。最后,提出一种基于堆叠技术的多算法融合学习模型,解决单一算法模型在预测中存在的泛化能力差的问题;进一步,使用径向基函数网络(RBF)作为二级元学习器融合上述模型,并使用粒子群优化(PSO)算法优化 RBF 参数,以减少泛化误差。实验结果表明,在新冠肺炎数据集上,集成模型的性能优于单一模型。为了使堆叠模型更好地应用于新发传染病的预测,我们将基于新冠肺炎数据集的预测模型应用于艾滋病和肺结核(PTB)病例数的预测,并验证了我们的模型在传染病预测中的广泛适用性。