Fan Bi, Peng Jiaxuan, Guo Hainan, Gu Haobin, Xu Kangkang, Wu Tingting
College of Management, Institute of Business Analysis and Supply Chain Management, Shenzhen University, Shenzhen, China.
Faculty of Science, University of St Andrews, St Andrews, United Kingdom.
JMIR Med Inform. 2022 Jul 20;10(7):e34504. doi: 10.2196/34504.
Emergency department (ED) overcrowding is a concerning global health care issue, which is mainly caused by the uncertainty of patient arrivals, especially during the pandemic. Accurate forecasting of patient arrivals can allow health resource allocation in advance to reduce overcrowding. Currently, traditional data, such as historical patient visits, weather, holiday, and calendar, are primarily used to create forecasting models. However, data from an internet search engine (eg, Google) is less studied, although they can provide pivotal real-time surveillance information. The internet data can be employed to improve forecasting performance and provide early warning, especially during the epidemic. Moreover, possible nonlinearities between patient arrivals and these variables are often ignored.
This study aims to develop an intelligent forecasting system with machine learning models and internet search index to provide an accurate prediction of ED patient arrivals, to verify the effectiveness of the internet search index, and to explore whether nonlinear models can improve the forecasting accuracy.
Data on ED patient arrivals were collected from July 12, 2009, to June 27, 2010, the period of the 2009 H1N1 pandemic. These included 139,910 ED visits in our collaborative hospital, which is one of the biggest public hospitals in Hong Kong. Traditional data were also collected during the same period. The internet search index was generated from 268 search queries on Google to comprehensively capture the information about potential patients. The relationship between the index and patient arrivals was verified by Pearson correlation coefficient, Johansen cointegration, and Granger causality. Linear and nonlinear models were then developed with the internet search index to predict patient arrivals. The accuracy and robustness were also examined.
All models could accurately predict patient arrivals. The causality test indicated internet search index as a strong predictor of ED patient arrivals. With the internet search index, the mean absolute percentage error (MAPE) and the root mean square error (RMSE) of the linear model reduced from 5.3% to 5.0% and from 24.44 to 23.18, respectively, whereas the MAPE and RMSE of the nonlinear model decreased even more, from 3.5% to 3% and from 16.72 to 14.55, respectively. Compared with each other, the experimental results revealed that the forecasting system with extreme learning machine, as well as the internet search index, had the best performance in both forecasting accuracy and robustness analysis.
The proposed forecasting system can make accurate, real-time prediction of ED patient arrivals. Compared with the static traditional variables, the internet search index significantly improves forecasting as a reliable predictor monitoring continuous behavior trend and sudden changes during the epidemic (P=.002). The nonlinear model performs better than the linear counterparts by capturing the dynamic relationship between the index and patient arrivals. Thus, the system can facilitate staff planning and workflow monitoring.
急诊科过度拥挤是一个令人担忧的全球医疗保健问题,主要由患者就诊的不确定性导致,尤其是在疫情期间。准确预测患者就诊人数可以提前进行卫生资源分配,以减少过度拥挤。目前,传统数据,如历史患者就诊记录、天气、节假日和日历等,主要用于创建预测模型。然而,来自互联网搜索引擎(如谷歌)的数据虽然可以提供关键的实时监测信息,但却较少被研究。互联网数据可用于提高预测性能并提供早期预警,尤其是在疫情期间。此外,患者就诊人数与这些变量之间可能存在的非线性关系常常被忽视。
本研究旨在开发一个基于机器学习模型和互联网搜索指数的智能预测系统,以准确预测急诊科患者就诊人数;验证互联网搜索指数的有效性;并探索非线性模型是否可以提高预测准确性。
收集了2009年7月12日至2010年6月27日(2009年甲型H1N1流感大流行期间)急诊科患者就诊的数据。这些数据包括我们合作医院的139910次急诊科就诊记录,该医院是香港最大的公立医院之一。同时也收集了同期的传统数据。互联网搜索指数由谷歌上268个搜索查询生成,以全面捕捉潜在患者信息。通过皮尔逊相关系数、约翰森协整检验和格兰杰因果检验来验证该指数与患者就诊人数之间的关系。然后利用互联网搜索指数开发线性和非线性模型来预测患者就诊人数,并检验其准确性和稳健性。
所有模型都能准确预测患者就诊人数。因果检验表明互联网搜索指数是急诊科患者就诊人数强有力的预测指标。加入互联网搜索指数后线性模型平均绝对百分比误差(MAPE)从5.3%降至5.0%,均方根误差(RMSE)从24.44降至23.18;而非线性模型的MAPE和RMSE下降得更多,分别从3.5%降至3%以及从16.72降至14.55。相互比较后实验结果表明,具有极限学习机和互联网搜索指数的预测系统在预测准确性和稳健性分析方面均表现最佳。
所提出的预测系统能够对急诊科患者就诊人数进行准确实时预测。与静态传统变量相比,互联网搜索指数作为一个可靠预测指标,能显著改善预测效果,监测疫情期间的持续行为趋势和突发变化(P=0.002)。非线性模型通过捕捉指数与患者就诊人数之间的动态关系,比线性模型表现更好。因此,该系统有助于人员规划和工作流程监测。