Guangxi Collaborative Innovation Center for Biomedicine,Life Science Institute,Guangxi Medical University,Nanning 530021, Guangxi,China.
Guangxi Key Laboratory of AIDS Prevention and Treatment & Guangxi Universities Key Laboratory of Prevention and Control of Highly Prevalent Disease, School of Public Health, Guangxi Medical University,Nanning 530021, Guangxi,China.
Epidemiol Infect. 2019 Jan;147:e194. doi: 10.1017/S095026881900075X.
Guangxi, a province in southwestern China, has the second highest reported number of HIV/AIDS cases in China. This study aimed to develop an accurate and effective model to describe the tendency of HIV and to predict its incidence in Guangxi. HIV incidence data of Guangxi from 2005 to 2016 were obtained from the database of the Chinese Center for Disease Control and Prevention. Long short-term memory (LSTM) neural network models, autoregressive integrated moving average (ARIMA) models, generalised regression neural network (GRNN) models and exponential smoothing (ES) were used to fit the incidence data. Data from 2015 and 2016 were used to validate the most suitable models. The model performances were evaluated by evaluating metrics, including mean square error (MSE), root mean square error, mean absolute error and mean absolute percentage error. The LSTM model had the lowest MSE when the N value (time step) was 12. The most appropriate ARIMA models for incidence in 2015 and 2016 were ARIMA (1, 1, 2) (0, 1, 2)12 and ARIMA (2, 1, 0) (1, 1, 2)12, respectively. The accuracy of GRNN and ES models in forecasting HIV incidence in Guangxi was relatively poor. Four performance metrics of the LSTM model were all lower than the ARIMA, GRNN and ES models. The LSTM model was more effective than other time-series models and is important for the monitoring and control of local HIV epidemics.
广西壮族自治区是中国西南部的一个省份,是中国报告的艾滋病病毒/艾滋病病例数第二高的省份。本研究旨在建立一个准确有效的模型来描述艾滋病病毒的流行趋势,并预测广西的发病率。从中国疾病预防控制中心数据库中获得了广西 2005 年至 2016 年的艾滋病发病率数据。使用长短期记忆(LSTM)神经网络模型、自回归综合移动平均(ARIMA)模型、广义回归神经网络(GRNN)模型和指数平滑(ES)模型来拟合发病率数据。使用 2015 年和 2016 年的数据来验证最合适的模型。通过评估指标,包括均方误差(MSE)、均方根误差、平均绝对误差和平均绝对百分比误差,评估模型性能。当 N 值(时间步长)为 12 时,LSTM 模型的 MSE 最低。2015 年和 2016 年发病率最合适的 ARIMA 模型分别为 ARIMA(1,1,2)(0,1,2)12 和 ARIMA(2,1,0)(1,1,2)12。GRNN 和 ES 模型预测广西艾滋病发病率的准确性相对较差。LSTM 模型的四个性能指标均低于 ARIMA、GRNN 和 ES 模型。LSTM 模型比其他时间序列模型更有效,对监测和控制当地艾滋病疫情具有重要意义。