Poirier Canelle, Lavenu Audrey, Bertaud Valérie, Campillo-Gimenez Boris, Chazard Emmanuel, Cuggia Marc, Bouzillé Guillaume
Laboratoire Traitement du Signal et de l'Image, Université de Rennes 1, Rennes, France.
INSERM, U1099, Rennes, France.
JMIR Public Health Surveill. 2018 Dec 21;4(4):e11361. doi: 10.2196/11361.
Traditional surveillance systems produce estimates of influenza-like illness (ILI) incidence rates, but with 1- to 3-week delay. Accurate real-time monitoring systems for influenza outbreaks could be useful for making public health decisions. Several studies have investigated the possibility of using internet users' activity data and different statistical models to predict influenza epidemics in near real time. However, very few studies have investigated hospital big data.
Here, we compared internet and electronic health records (EHRs) data and different statistical models to identify the best approach (data type and statistical model) for ILI estimates in real time.
We used Google data for internet data and the clinical data warehouse eHOP, which included all EHRs from Rennes University Hospital (France), for hospital data. We compared 3 statistical models-random forest, elastic net, and support vector machine (SVM).
For national ILI incidence rate, the best correlation was 0.98 and the mean squared error (MSE) was 866 obtained with hospital data and the SVM model. For the Brittany region, the best correlation was 0.923 and MSE was 2364 obtained with hospital data and the SVM model.
We found that EHR data together with historical epidemiological information (French Sentinelles network) allowed for accurately predicting ILI incidence rates for the entire France as well as for the Brittany region and outperformed the internet data whatever was the statistical model used. Moreover, the performance of the two statistical models, elastic net and SVM, was comparable.
传统监测系统可得出流感样疾病(ILI)发病率的估计值,但会有1至3周的延迟。用于流感暴发的准确实时监测系统可能有助于做出公共卫生决策。多项研究调查了利用互联网用户活动数据及不同统计模型进行近实时流感疫情预测的可能性。然而,极少有研究对医院大数据进行调查。
在此,我们比较了互联网数据与电子健康记录(EHR)数据以及不同统计模型,以确定实时估计ILI的最佳方法(数据类型和统计模型)。
我们将谷歌数据用作互联网数据,将临床数据仓库eHOP(其中包含法国雷恩大学医院的所有EHR)用作医院数据。我们比较了3种统计模型——随机森林、弹性网络和支持向量机(SVM)。
对于全国ILI发病率,使用医院数据和SVM模型得出的最佳相关性为0.98,均方误差(MSE)为866。对于布列塔尼地区,使用医院数据和SVM模型得出的最佳相关性为0.923,MSE为2364。
我们发现,EHR数据与历史流行病学信息(法国哨兵网络)相结合能够准确预测整个法国以及布列塔尼地区的ILI发病率,且无论使用何种统计模型,其表现均优于互联网数据。此外,弹性网络和SVM这两种统计模型的性能相当。