Berke Olaf, Trotz-Williams Lise, de Montigny Simon
Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON.
Wellington-Dufferin Guelph Public Health, Guelph, ON.
Can Commun Dis Rep. 2020 Jun 4;46(6):192-197. doi: 10.14745/ccdr.v46i06a07.
The rise of big data and related predictive modelling based on machine learning algorithms over the last two decades have provided new opportunities for disease surveillance and public health preparedness. Big data come with the promise of faster generation of and access to more precise information, potentially facilitating predictive precision in public health ("precision public health"). As an example, we considered forecasting of the future course of the monthly cryptosporidiosis incidence in Ontario.
The traditional statistical approach to forecasting is the seasonal autoregressive integrated moving-average (SARIMA) model. We applied SARIMA and an artificial neural network (ANN) approach, specifically a feed-forward neural network, to predict monthly cryptosporidiosis incidence in Ontario in 2017 using 2005-2016 data as a training set. Both forecasting approaches are automated to make them relevant in a disease surveillance context. We compared the resulting forecasts using the root mean squared error (RMSE) and mean absolute error (MAE) as measures of predictive accuracy.
Cryptosporidiosis is a seasonal disease, which peaks in Ontario in late summer. In this study, the SARIMA model and ANN forecasting approaches captured the seasonal pattern of cryptosporidiosis well. Contrary to similar studies reported in the literature, the ANN forecasts of cryptosporidiosis were slightly less accurate than the SARIMA model forecasts.
The ANN and SARIMA approaches are suitable for automated forecasting of public health time series data from surveillance systems. Future studies should employ additional algorithms (e.g. random forests) and assess accuracy by using alternative diseases for case studies and conducting rigorous simulation studies. Difference between the forecasts from the machine learning algorithm, that is, the ANN, and the statistical learning model, that is, the SARIMA, should be considered with respect to philosophical differences between the two approaches.
在过去二十年中,大数据的兴起以及基于机器学习算法的相关预测建模为疾病监测和公共卫生防范提供了新机遇。大数据有望更快地生成并获取更精确的信息,有可能提高公共卫生领域的预测精度(“精准公共卫生”)。例如,我们考虑对安大略省每月隐孢子虫病发病率的未来趋势进行预测。
传统的预测统计方法是季节性自回归积分滑动平均(SARIMA)模型。我们应用SARIMA和人工神经网络(ANN)方法,具体为前馈神经网络,以2005 - 2016年的数据作为训练集,预测2017年安大略省每月的隐孢子虫病发病率。两种预测方法都实现了自动化,以便在疾病监测背景下具有实用性。我们使用均方根误差(RMSE)和平均绝对误差(MAE)作为预测准确性的度量,比较了所得的预测结果。
隐孢子虫病是一种季节性疾病,在安大略省夏末达到高峰。在本研究中,SARIMA模型和ANN预测方法都很好地捕捉到了隐孢子虫病的季节性模式。与文献中报道的类似研究相反,隐孢子虫病的ANN预测比SARIMA模型预测的准确性略低。
ANN和SARIMA方法适用于对监测系统中的公共卫生时间序列数据进行自动预测。未来的研究应采用其他算法(如随机森林),并通过使用替代疾病进行案例研究和开展严格的模拟研究来评估准确性。应从两种方法的哲学差异方面考虑机器学习算法(即ANN)和统计学习模型(即SARIMA)预测结果之间的差异。