Elkin Lauren S, Topal Kamil, Bebek Gurkan
Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA.
Center for Proteomic and Bioinformatics, Department of Nutrition, Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA.
Inf Discov Deliv. 2017;45(3):110-120. doi: 10.1108/IDD-05-2017-0046.
PURPOSE–: Predicting future outbreaks and understanding how they are spreading from location to location can improve patient care provided. Recently, mining social media big data provided the ability to track patterns and trends across the world. This study aims to analyze social media micro-blogs and geographical locations to understand how disease outbreaks spread over geographies and to enhance forecasting of future disease outbreaks.
DESIGN/METHODOLOGY/APPROACH –: In this paper, the authors use Twitter data as the social media data source, influenza-like illnesses (ILI) as disease epidemic and states in the USA as geographical locations. They present a novel network-based model to make predictions about the spread of diseases a week in advance utilizing social media big data.
FINDINGS–: The authors showed that flu-related tweets align well with ILI data from the Centers for Disease Control and Prevention (CDC) ( < 0.049). The authors compared this model to earlier approaches that utilized airline traffic, and showed that ILI activity estimates of their model were more accurate. They also found that their disease diffusion model yielded accurate predictions for upcoming ILI activity ( < 0.04), and they predicted the diffusion of flu across states based on geographical surroundings at 76 per cent accuracy. The equations and procedures can be translated to apply to any social media data, other contagious diseases and geographies to mine large data sets.
ORIGINALITY/VALUE–: First, while extensive work has been presented utilizing time-series analysis on single geographies, or post-analysis of highly contagious diseases, no previous work has provided a generalized solution to identify how contagious diseases diffuse across geographies, such as states in the USA. Secondly, due to nature of the social media data, various statistical models have been extensively used to address these problems.
预测未来的疫情爆发,并了解疫情如何在不同地点之间传播,这有助于改善所提供的患者护理。最近,挖掘社交媒体大数据使人们有能力追踪全球范围内的模式和趋势。本研究旨在分析社交媒体微博和地理位置,以了解疾病爆发如何在不同地区传播,并加强对未来疾病爆发的预测。
设计/方法/途径:在本文中,作者将推特数据用作社交媒体数据源,将流感样疾病(ILI)作为疾病流行情况,并将美国的各个州作为地理位置。他们提出了一种基于网络的新颖模型,利用社交媒体大数据提前一周对疾病传播进行预测。
作者表明,与流感相关的推文与美国疾病控制与预防中心(CDC)的ILI数据高度吻合(<0.049)。作者将该模型与早期利用航空交通数据的方法进行了比较,结果表明他们模型的ILI活动估计更为准确。他们还发现,他们的疾病传播模型对即将到来的ILI活动做出了准确预测(<0.04),并且他们基于地理环境对流感在各州之间传播的预测准确率达到了76%。这些方程式和程序可以进行转换,以应用于任何社交媒体数据、其他传染病和地区,从而挖掘大数据集。
原创性/价值:首先,虽然已经有大量工作利用对单个地区的时间序列分析或对高传染性疾病的事后分析,但之前没有工作提供一个通用的解决方案来识别传染病如何在不同地区(如美国的各个州)传播。其次,由于社交媒体数据的性质,各种统计模型已被广泛用于解决这些问题。