Robertson Colin, Yee Lauren
Department of Geography & Environmental Studies, Wilfrid Laurier University, 75 University Ave West, Waterloo, ON, N2L 3C5, Canada.
PLoS One. 2016 Nov 23;11(11):e0165688. doi: 10.1371/journal.pone.0165688. eCollection 2016.
The use of Internet-based sources of information for health surveillance applications has increased in recent years, as a greater share of social and media activity happens through online channels. The potential surveillance value in online sources of information about emergent health events include early warning, situational awareness, risk perception and evaluation of health messaging among others. The challenge in harnessing these sources of data is the vast number of potential sources to monitor and developing the tools to translate dynamic unstructured content into actionable information. In this paper we investigated the use of one social media outlet, Twitter, for surveillance of avian influenza risk in North America. We collected AI-related messages over a five-month period and compared these to official surveillance records of AI outbreaks. A fully automated data extraction and analysis pipeline was developed to acquire, structure, and analyze social media messages in an online context. Two methods of outbreak detection; a static threshold and a cumulative-sum dynamic threshold; based on a time series model of normal activity were evaluated for their ability to discern important time periods of AI-related messaging and media activity. Our findings show that peaks in activity were related to real-world events, with outbreaks in Nigeria, France and the USA receiving the most attention while those in China were less evident in the social media data. Topic models found themes related to specific AI events for the dynamic threshold method, while many for the static method were ambiguous. Further analyses of these data might focus on quantifying the bias in coverage and relation between outbreak characteristics and detectability in social media data. Finally, while the analyses here focused on broad themes and trends, there is likely additional value in developing methods for identifying low-frequency messages, operationalizing this methodology into a comprehensive system for visualizing patterns extracted from the Internet, and integrating these data with other sources of information such as wildlife, environment, and agricultural data.
近年来,随着越来越多的社交和媒体活动通过在线渠道进行,基于互联网的信息源在健康监测应用中的使用有所增加。在线信息源对于突发健康事件的潜在监测价值包括早期预警、态势感知、风险认知以及对健康信息的评估等。利用这些数据来源面临的挑战在于需要监测的潜在来源数量众多,以及开发将动态非结构化内容转化为可操作信息的工具。在本文中,我们研究了利用社交媒体平台推特来监测北美地区禽流感风险的情况。我们在五个月的时间里收集了与禽流感相关的信息,并将其与禽流感疫情的官方监测记录进行了比较。我们开发了一个全自动的数据提取和分析流程,以便在在线环境中获取、整理和分析社交媒体信息。我们评估了两种基于正常活动时间序列模型的疫情检测方法;一种是静态阈值法,另一种是累积和动态阈值法,以确定它们辨别禽流感相关信息和媒体活动重要时间段的能力。我们的研究结果表明,活动高峰与现实世界的事件相关,尼日利亚、法国和美国的疫情受到了最多关注,而中国的疫情在社交媒体数据中则不太明显。主题模型发现,动态阈值法能找到与特定禽流感事件相关的主题,而静态阈值法的许多主题则不明确。对这些数据的进一步分析可能会集中在量化覆盖范围的偏差以及疫情特征与社交媒体数据中可检测性之间的关系上。最后,虽然这里的分析集中在宽泛的主题和趋势上,但开发识别低频信息的方法、将这种方法应用到一个全面的系统中以可视化从互联网提取的模式,并将这些数据与野生动物、环境和农业数据等其他信息来源整合起来,可能会有额外的价值。