Third Department of Tuberculosis, Anhui Chest Hospital, 397 Jixi Road, Shushan, Hefei, 230000, China.
Department of Oncology, The 901th Hospital of Joint Logistic Support Force PLA, Hefei, 230032, China.
BMC Pulm Med. 2024 Oct 26;24(1):536. doi: 10.1186/s12890-024-03296-z.
Tuberculosis has been one of the most common communicable diseases raising global concerns. Accurately predicting the incidence of Tuberculosis remains challenging. Here we constructed a time-series analysis and fusion tool using multi-source data, and aimed to more accurately predict the incidence trend of tuberculosis of Anhui Province from 2013 to 2023. Random forest algorithm (RF), Feature Recursive Elimination (RFE) and Least absolute shrinkage and selection operator (LASSO) were implemented to improve the derivation of features related to infectious diseases and feature work. Based on the characteristics of infectious disease data, a model of RF-RFE-LASSO integrated particle swarm optimization multiple inputs long short term memory recurrent neural network (RRL-PSO-MiLSTM) was created to perform more accurate prediction. Results showed that the PSO-MiLSTM achieved excellent prediction results compared with common single-input and multi-input time-series models (test set MSE:42.3555, MAE: 59.3333, RMSE: 146.7237, MAPE: 2.1133, R: 0.8634). PSO-MiLSTM enriches and complements the methodological research content of calibrating the time-series predictive analysis of infectious diseases using multi-source data, and can be used as a brand-new benchmark for the analysis of influencing factors and trend prediction of infectious diseases at the public health level in the future, as well as providing a reference for incidence rate prediction of infectious diseases.
结核病一直是全球关注的最常见传染病之一。准确预测结核病的发病率仍然具有挑战性。在这里,我们使用多源数据构建了一个时间序列分析和融合工具,旨在更准确地预测 2013 年至 2023 年安徽省结核病的发病率趋势。随机森林算法(RF)、特征递归消除(RFE)和最小绝对值收缩和选择算子(LASSO)用于改进与传染病相关的特征的推导和特征工作。基于传染病数据的特点,创建了一个基于随机森林-特征递归消除-最小绝对值收缩和选择算子(RFE-LASSO)集成粒子群优化多输入长短期记忆递归神经网络(RF-RFE-LASSO-PSO-MiLSTM)模型,以进行更准确的预测。结果表明,与常见的单输入和多输入时间序列模型相比,PSO-MiLSTM 具有出色的预测结果(测试集 MSE:42.3555,MAE:59.3333,RMSE:146.7237,MAPE:2.1133,R:0.8634)。PSO-MiLSTM 丰富和补充了使用多源数据校准传染病时间序列预测分析的方法学研究内容,可作为未来公共卫生水平传染病影响因素分析和趋势预测的全新基准,为传染病发病率预测提供参考。