Université de Rennes, CHU Rennes, INSERM, LTSI - UMR 1099, Rennes, France.
CHU Martinique, Centre de Données Cliniques, Martinique, France.
PLoS Negl Trop Dis. 2022 Jan 7;16(1):e0010056. doi: 10.1371/journal.pntd.0010056. eCollection 2022 Jan.
Traditionally, dengue surveillance is based on case reporting to a central health agency. However, the delay between a case and its notification can limit the system responsiveness. Machine learning methods have been developed to reduce the reporting delays and to predict outbreaks, based on non-traditional and non-clinical data sources. The aim of this systematic review was to identify studies that used real-world data, Big Data and/or machine learning methods to monitor and predict dengue-related outcomes.
METHODOLOGY/PRINCIPAL FINDINGS: We performed a search in PubMed, Scopus, Web of Science and grey literature between January 1, 2000 and August 31, 2020. The review (ID: CRD42020172472) focused on data-driven studies. Reviews, randomized control trials and descriptive studies were not included. Among the 119 studies included, 67% were published between 2016 and 2020, and 39% used at least one novel data stream. The aim of the included studies was to predict a dengue-related outcome (55%), assess the validity of data sources for dengue surveillance (23%), or both (22%). Most studies (60%) used a machine learning approach. Studies on dengue prediction compared different prediction models, or identified significant predictors among several covariates in a model. The most significant predictors were rainfall (43%), temperature (41%), and humidity (25%). The two models with the highest performances were Neural Networks and Decision Trees (52%), followed by Support Vector Machine (17%). We cannot rule out a selection bias in our study because of our two main limitations: we did not include preprints and could not obtain the opinion of other international experts.
CONCLUSIONS/SIGNIFICANCE: Combining real-world data and Big Data with machine learning methods is a promising approach to improve dengue prediction and monitoring. Future studies should focus on how to better integrate all available data sources and methods to improve the response and dengue management by stakeholders.
传统上,登革热监测是基于向中央卫生机构报告病例。然而,病例与报告之间的延迟可能会限制系统的响应能力。已经开发了机器学习方法,以基于非传统和非临床数据源来减少报告延迟并预测暴发。本系统评价的目的是确定使用真实世界数据、大数据和/或机器学习方法来监测和预测登革热相关结果的研究。
方法/主要发现:我们在 2000 年 1 月 1 日至 2020 年 8 月 31 日期间在 PubMed、Scopus、Web of Science 和灰色文献中进行了搜索。该综述(ID:CRD42020172472)专注于数据驱动的研究。综述、随机对照试验和描述性研究不包括在内。在纳入的 119 项研究中,67%的研究发表于 2016 年至 2020 年期间,39%的研究至少使用了一种新的数据流。纳入研究的目的是预测登革热相关结局(55%)、评估登革热监测数据源的有效性(23%)或两者兼而有之(22%)。大多数研究(60%)使用机器学习方法。关于登革热预测的研究比较了不同的预测模型,或在模型中的几个协变量中确定了重要的预测因子。最重要的预测因子是降雨量(43%)、温度(41%)和湿度(25%)。性能最高的两种模型是神经网络和决策树(52%),其次是支持向量机(17%)。由于我们的两个主要限制,我们不能排除我们的研究存在选择偏倚:我们没有包括预印本,也无法获得其他国际专家的意见。
结论/意义:将真实世界数据和大数据与机器学习方法相结合是提高登革热预测和监测的有前途的方法。未来的研究应侧重于如何更好地整合所有可用的数据源和方法,以提高利益相关者的响应能力和登革热管理。