• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用机器学习模型预测戊型肝炎。

Prediction of hepatitis E using machine learning models.

机构信息

School of Data and Computer Science, Shandong Women's Unversity, Jinan, Shandong, China.

Shandong Provincial Key Laboratory of Infectious Disease Control and Prevention, Shandong Center for Disease Control and Prevention, Jinan, Shandong, China.

出版信息

PLoS One. 2020 Sep 17;15(9):e0237750. doi: 10.1371/journal.pone.0237750. eCollection 2020.

DOI:10.1371/journal.pone.0237750
PMID:32941452
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7497991/
Abstract

BACKGROUND

Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared.

METHODS

Autoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE).

RESULTS

By analyzing data, we took ARIMA(1, 1, 1), ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficient C and kernel function parameter g were set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively.

CONCLUSIONS

Comparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number.

摘要

背景

准确可靠的传染病预测对于计划干预措施以减少或预防疾病传播的公共卫生组织具有重要价值。已经开发了许多模型来实现这一目标。然而,对于不同的数据序列,这些模型的性能有所不同。戊型肝炎是一种急性肝病,一直是一个主要的公共卫生问题。哪种模型更适合预测戊型肝炎的发病率?在本文中,使用了三种不同的方法,并比较了这三种方法的性能。

方法

采用自回归综合移动平均(ARIMA)、支持向量机(SVM)和长短期记忆(LSTM)递归神经网络,并进行了比较。ARIMA 通过 python 借助于 statsmodels 来实现。SVM 通过 matlab 借助于 libSVM 库来完成。LSTM 通过 Keras(一种深度学习库)由我们自己设计。为了解决由于训练样本有限而导致的过拟合问题,我们在 LSTM 模型中采用了辍学和正则化策略。实验数据来自 2005 年 1 月至 2017 年 12 月山东省戊型肝炎的月发病率和病例数。我们选择 2015 年 7 月至 2017 年 12 月的数据进行模型验证,其余数据作为训练集。采用均方根误差(RMSE)、平均绝对百分比误差(MAPE)和平均绝对误差(MAE)三种指标来比较模型的性能。

结果

通过数据分析,我们选择了 ARIMA(1,1,1)和 ARIMA(3,1,2)作为月发病率预测模型和病例数预测模型。使用交叉验证和网格搜索来优化 SVM 的参数。对于发病率预测,惩罚系数 C 和核函数参数 g 分别设置为 8 和 0.125;对于病例数预测,惩罚系数 C 和核函数参数 g 分别设置为 22 和 0.01。LSTM 有 4 个节点。辍学和 L2 正则化参数分别设置为 0.15 和 0.001。根据 RMSE 指标,我们得到了 0.022、0.0204 和 0.01 用于发病率预测,使用 ARIMA、SVM 和 LSTM。对于病例数预测,我们得到了 22.25、20.0368 和 11.75,使用三种模型。对于 MAPE 指标,结果分别为 23.5%、21.7%、15.08%和 23.6%、21.44%、13.6%,用于发病率预测和病例数预测。对于 MAE 指标,结果分别为 0.018、0.0167、0.011 和 18.003、16.5815、9.984,用于发病率预测和病例数预测。

结论

将 ARIMA、SVM 和 LSTM 进行比较后,我们发现非线性模型(SVM、LSTM)优于线性模型(ARIMA)。在所有三个 RMSE、MAPE 和 MAE 指标中,LSTM 的性能最好。因此,LSTM 是预测戊型肝炎月发病率和病例数最适合的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/4c1ef4f2c163/pone.0237750.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/c7009282d034/pone.0237750.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/e2eb29e22f35/pone.0237750.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/4409a0f7816b/pone.0237750.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/b34c2477d7a5/pone.0237750.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/8cd7893a52f4/pone.0237750.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/a9bbbbe7d537/pone.0237750.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/c877d102adfa/pone.0237750.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/4c1ef4f2c163/pone.0237750.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/c7009282d034/pone.0237750.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/e2eb29e22f35/pone.0237750.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/4409a0f7816b/pone.0237750.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/b34c2477d7a5/pone.0237750.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/8cd7893a52f4/pone.0237750.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/a9bbbbe7d537/pone.0237750.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/c877d102adfa/pone.0237750.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0989/7497991/4c1ef4f2c163/pone.0237750.g008.jpg

相似文献

1
Prediction of hepatitis E using machine learning models.使用机器学习模型预测戊型肝炎。
PLoS One. 2020 Sep 17;15(9):e0237750. doi: 10.1371/journal.pone.0237750. eCollection 2020.
2
Deep learning models for hepatitis E incidence prediction leveraging meteorological factors.利用气象因素进行戊型肝炎发病率预测的深度学习模型。
PLoS One. 2023 Mar 13;18(3):e0282928. doi: 10.1371/journal.pone.0282928. eCollection 2023.
3
Predicting incidence of hepatitis E using machine learning in Jiangsu Province, China.利用机器学习预测中国江苏省戊型肝炎发病率。
Epidemiol Infect. 2022 Jul 28;150:e149. doi: 10.1017/S0950268822001303.
4
Comparison of ARIMA model, DNN model and LSTM model in predicting disease burden of occupational pneumoconiosis in Tianjin, China.比较中国天津职业性尘肺病疾病负担的 ARIMA 模型、DNN 模型和 LSTM 模型。
BMC Public Health. 2022 Nov 24;22(1):2167. doi: 10.1186/s12889-022-14642-3.
5
Comparison of ARIMA and LSTM for prediction of hemorrhagic fever at different time scales in China.ARIMA 和 LSTM 在不同时间尺度下预测中国出血热的比较。
PLoS One. 2022 Jan 14;17(1):e0262009. doi: 10.1371/journal.pone.0262009. eCollection 2022.
6
The research of ARIMA, GM(1,1), and LSTM models for prediction of TB cases in China.ARIMA、GM(1,1) 和 LSTM 模型在中国结核病病例预测中的研究。
PLoS One. 2022 Feb 23;17(2):e0262734. doi: 10.1371/journal.pone.0262734. eCollection 2022.
7
A hybrid model for hand-foot-mouth disease prediction based on ARIMA-EEMD-LSTM.基于 ARIMA-EEMD-LSTM 的手足口病预测混合模型。
BMC Infect Dis. 2023 Dec 15;23(1):879. doi: 10.1186/s12879-023-08864-y.
8
Comparison of autoregressive integrated moving average model and generalised regression neural network model for prediction of haemorrhagic fever with renal syndrome in China: a time-series study.自回归综合移动平均模型与广义回归神经网络模型在中国肾综合征出血热预测中的比较:一项时间序列研究。
BMJ Open. 2019 Jun 16;9(6):e025773. doi: 10.1136/bmjopen-2018-025773.
9
Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods.使用深度学习方法对COVID-19的新增病例和新增死亡率进行时间序列预测。
Results Phys. 2021 Aug;27:104495. doi: 10.1016/j.rinp.2021.104495. Epub 2021 Jun 26.
10
A comparative study of statistical and machine learning models on carbon dioxide emissions prediction of China.中国二氧化碳排放预测的统计和机器学习模型比较研究。
Environ Sci Pollut Res Int. 2023 Nov;30(55):117485-117502. doi: 10.1007/s11356-023-30428-5. Epub 2023 Oct 23.

引用本文的文献

1
Meteorological determinants of hepatitis E dynamics in Jiangsu Province, China: a pre-COVID-19 era study focusing on multi-route transmission (2005-2018).中国江苏省戊型肝炎流行趋势的气象决定因素:一项聚焦多途径传播的新冠疫情前时代研究(2005 - 2018年)
Front Public Health. 2025 Aug 7;13:1604579. doi: 10.3389/fpubh.2025.1604579. eCollection 2025.
2
Harnessing artificial intelligence for enhanced public health surveillance: a narrative review.利用人工智能加强公共卫生监测:一篇叙述性综述。
Front Public Health. 2025 Jul 30;13:1601151. doi: 10.3389/fpubh.2025.1601151. eCollection 2025.
3
Assessing the potential for application of machine learning in predicting weather-sensitive waterborne diseases in selected districts of Tanzania.

本文引用的文献

1
Development and evaluation of a deep learning approach for modeling seasonality and trends in hand-foot-mouth disease incidence in mainland China.开发并评估一种深度学习方法,以模拟中国大陆手足口病发病率的季节性和趋势。
Sci Rep. 2019 May 29;9(1):8046. doi: 10.1038/s41598-019-44469-9.
2
Comparison of ARIMA and GM(1,1) models for prediction of hepatitis B in China.比较 ARIMA 和 GM(1,1)模型在中国乙型肝炎预测中的应用。
PLoS One. 2018 Sep 4;13(9):e0201987. doi: 10.1371/journal.pone.0201987. eCollection 2018.
3
Using the Baidu Search Index to Predict the Incidence of HIV/AIDS in China.
评估在坦桑尼亚部分地区应用机器学习预测对天气敏感的水源性疾病的潜力。
Front Artif Intell. 2025 Jun 4;8:1597727. doi: 10.3389/frai.2025.1597727. eCollection 2025.
4
An optimization protocol for MRI examination resource allocation based on demand forecasting and linear programming.一种基于需求预测和线性规划的磁共振成像(MRI)检查资源分配优化方案。
Sci Rep. 2025 Apr 29;15(1):15076. doi: 10.1038/s41598-025-98817-z.
5
Diversity of Hepatitis E Viruses in Rats in Yunnan Province and the Inner Mongolia Autonomous Region of China.中国云南省和内蒙古自治区大鼠中戊型肝炎病毒的多样性
Viruses. 2025 Mar 28;17(4):490. doi: 10.3390/v17040490.
6
A retrospective study using machine learning to develop predictive model to identify rotavirus-associated acute gastroenteritis in children.一项使用机器学习开发预测模型以识别儿童轮状病毒相关性急性胃肠炎的回顾性研究。
PeerJ. 2025 Apr 14;13:e19025. doi: 10.7717/peerj.19025. eCollection 2025.
7
Multi-region infectious disease prediction modeling based on spatio-temporal graph neural network and the dynamic model.基于时空图神经网络和动态模型的多区域传染病预测建模
PLoS Comput Biol. 2025 Jan 9;21(1):e1012738. doi: 10.1371/journal.pcbi.1012738. eCollection 2025 Jan.
8
Deep learning models for hepatitis E incidence prediction leveraging Baidu index.利用百度指数进行戊型肝炎发病率预测的深度学习模型
BMC Public Health. 2024 Oct 31;24(1):3014. doi: 10.1186/s12889-024-20532-7.
9
Hepatitis E virus infections.戊型肝炎病毒感染
World J Virol. 2024 Jun 25;13(2):90951. doi: 10.5501/wjv.v13.i2.90951.
10
Deep evolutionary fusion neural network: a new prediction standard for infectious disease incidence rates.深度进化融合神经网络:传染病发病率预测的新标准
BMC Bioinformatics. 2024 Jan 23;25(1):38. doi: 10.1186/s12859-023-05621-5.
利用百度搜索指数预测中国艾滋病的发病率。
Sci Rep. 2018 Jun 13;8(1):9038. doi: 10.1038/s41598-018-27413-1.
4
Ensemble method for dengue prediction.登革热预测的集成方法。
PLoS One. 2018 Jan 3;13(1):e0189988. doi: 10.1371/journal.pone.0189988. eCollection 2018.
5
Advances in using Internet searches to track dengue.利用互联网搜索追踪登革热的进展。
PLoS Comput Biol. 2017 Jul 20;13(7):e1005607. doi: 10.1371/journal.pcbi.1005607. eCollection 2017 Jul.
6
CDC Grand Rounds: Modeling and Public Health Decision-Making.疾病预防控制中心例会:建模与公共卫生决策。
MMWR Morb Mortal Wkly Rep. 2016 Dec 9;65(48):1374-1377. doi: 10.15585/mmwr.mm6548a4.
7
Evaluating the performance of infectious disease forecasts: A comparison of climate-driven and seasonal dengue forecasts for Mexico.评估传染病预测的性能:墨西哥气候驱动型和季节性登革热预测的比较。
Sci Rep. 2016 Sep 26;6:33707. doi: 10.1038/srep33707.
8
Analysis of dengue infection based on Raman spectroscopy and support vector machine (SVM).基于拉曼光谱和支持向量机(SVM)的登革热感染分析。
Biomed Opt Express. 2016 May 18;7(6):2249-56. doi: 10.1364/BOE.7.002249. eCollection 2016 Jun 1.
9
Application of a Combined Model with Autoregressive Integrated Moving Average (ARIMA) and Generalized Regression Neural Network (GRNN) in Forecasting Hepatitis Incidence in Heng County, China.自回归积分滑动平均(ARIMA)与广义回归神经网络(GRNN)组合模型在中国横县肝炎发病率预测中的应用
PLoS One. 2016 Jun 3;11(6):e0156768. doi: 10.1371/journal.pone.0156768. eCollection 2016.
10
Accurate estimation of influenza epidemics using Google search data via ARGO.通过ARGO利用谷歌搜索数据准确估计流感疫情。
Proc Natl Acad Sci U S A. 2015 Nov 24;112(47):14473-8. doi: 10.1073/pnas.1515373112. Epub 2015 Nov 9.