Suppr超能文献

结合搜索、社交媒体和传统数据源以改善流感监测。

Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.

作者信息

Santillana Mauricio, Nguyen André T, Dredze Mark, Paul Michael J, Nsoesie Elaine O, Brownstein John S

机构信息

Harvard School of Engineering and Applied Sciences, Cambridge, Massachusetts, United States of America; Boston Children's Hospital Informatics Program, Boston, Massachusetts, United States of America; Harvard Medical School, Boston, Massachusetts, United States of America.

Harvard School of Engineering and Applied Sciences, Cambridge, Massachusetts, United States of America.

出版信息

PLoS Comput Biol. 2015 Oct 29;11(10):e1004513. doi: 10.1371/journal.pcbi.1004513. eCollection 2015 Oct.

Abstract

We present a machine learning-based methodology capable of providing real-time ("nowcast") and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC's ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013-2014 (retrospective) and 2014-2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method's predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT's real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.

摘要

我们提出了一种基于机器学习的方法,该方法能够通过利用来自多个数据源的数据,包括谷歌搜索、推特微博、近乎实时的医院就诊记录以及参与式监测系统的数据,来提供美国流感活动的实时(“现况预测”)和预测估计。我们的主要贡献在于将每个数据源独立生成的多个类流感疾病(ILI)活动估计值,利用机器学习集成方法合并为一个ILI单一预测值。我们的方法利用了每个数据源中的信息,并在疾病控制与预防中心(CDC)的ILI报告发布前长达四周的时间内,生成准确的每周ILI预测。我们在2013 - 2014年(回顾性)和2014 - 2015年(实时)流感季节,针对四个每周时间范围中的每一个,评估了我们的集成方法的预测能力。我们的集成方法展示了几个优点:(1)我们的集成方法的预测优于单独使用每个数据源进行的每一个预测;(2)我们的方法能够以可比的准确性在全球流感监测与应对系统(GFT)实时估计的一周前生成预测;(3)我们的两周和三周预测估计与使用自回归模型的实时预测具有可比的准确性。此外,我们的结果表明,在所有时间范围内,将社交媒体和众包数据等不同数据流纳入流感预测能获得相当多的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f67/4626021/a34bbb7c1603/pcbi.1004513.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验