Department of Information Science, University of Colorado, Boulder, Colorado, USA.
Department of Engineering Management and Systems Engineering, George Washington University, Washington, District of Columbia, USA.
BMJ Open. 2019 Jan 15;9(1):e024018. doi: 10.1136/bmjopen-2018-024018.
The Centers for Disease Control and Prevention (CDC) spend significant time and resources to track influenza vaccination coverage each influenza season using national surveys. Emerging data from social media provide an alternative solution to surveillance at both national and local levels of influenza vaccination coverage in near real time.
This study aimed to characterise and analyse the vaccinated population from temporal, demographical and geographical perspectives using automatic classification of vaccination-related Twitter data.
In this cross-sectional study, we continuously collected tweets containing both influenza-related terms and vaccine-related terms covering four consecutive influenza seasons from 2013 to 2017. We created a machine learning classifier to identify relevant tweets, then evaluated the approach by comparing to data from the CDC's FluVaxView. We limited our analysis to tweets geolocated within the USA.
We assessed 1 124 839 tweets. We found strong correlations of 0.799 between monthly Twitter estimates and CDC, with correlations as high as 0.950 in individual influenza seasons. We also found that our approach obtained geographical correlations of 0.387 at the US state level and 0.467 at the regional level. Finally, we found a higher level of influenza vaccine tweets among female users than male users, also consistent with the results of CDC surveys on vaccine uptake.
Significant correlations between Twitter data and CDC data show the potential of using social media for vaccination surveillance. Temporal variability is captured better than geographical and demographical variability. We discuss potential paths forward for leveraging this approach.
美国疾病控制与预防中心(CDC)在每个流感季节都会花费大量时间和资源,通过全国性调查来追踪流感疫苗接种率。社交媒体上新兴的数据提供了一种替代方案,可以在全国和地方各级实时监测流感疫苗接种率。
本研究旨在通过自动分类与疫苗接种相关的 Twitter 数据,从时间、人口统计学和地理角度来描述和分析接种人群。
在这项横断面研究中,我们连续收集了包含流感相关术语和疫苗相关术语的推文,覆盖了 2013 年至 2017 年的四个连续流感季节。我们创建了一个机器学习分类器来识别相关推文,然后通过与 CDC 的 FluVaxView 数据进行比较来评估该方法。我们将分析仅限于位于美国的推文。
我们评估了 1124839 条推文。我们发现,每月的 Twitter 估计值与 CDC 之间的相关性很强,达到了 0.799,在个别流感季节的相关性高达 0.950。我们还发现,我们的方法在美国州一级的地理相关性为 0.387,在区域一级的地理相关性为 0.467。最后,我们发现女性用户比男性用户发布更多的流感疫苗推文,这与 CDC 关于疫苗接种率的调查结果一致。
Twitter 数据与 CDC 数据之间存在显著相关性,表明利用社交媒体进行疫苗接种监测具有潜力。时间变化的捕捉优于地理和人口统计学的变化。我们讨论了利用这种方法的潜在途径。