Department of Physics and Astronomy, University College London, Gower Street, London, WC1E 6BT, UK.
Department of Physics, University of Warwick, Coventry, CV4 7AL, UK.
Sci Rep. 2021 Sep 24;11(1):19009. doi: 10.1038/s41598-021-98396-9.
In the absence of nationwide mass testing for an emerging health crisis, alternative approaches could provide necessary information efficiently to aid policy makers and health bodies when dealing with a pandemic. The following work presents a methodology by which Twitter data surrounding the first wave of the COVID-19 pandemic in the UK is harvested and analysed using two main approaches. The first is an investigation into localized outbreak predictions by developing a prototype early-warning system using the distribution of total tweet volume. The temporal lag between the rises in the number of COVID-19 related tweets and officially reported deaths by Public Health England (PHE) is observed to be 6-27 days for various UK cities which matches the temporal lag values found in the literature. To better understand the topics of discussion and attitudes of people surrounding the pandemic, the second approach is an in-depth behavioural analysis assessing the public opinion and response to government policies such as the introduction of face-coverings. Using topic modelling, nine distinct topics are identified within the corpus of COVID-19 tweets, of which the themes ranged from retail to government bodies. Sentiment analysis on a subset of mask related tweets revealed sentiment spikes corresponding to major news and announcements. A Named Entity Recognition (NER) algorithm is trained and applied in a semi-supervised manner to recognise tweets containing location keywords within the unlabelled corpus and achieved a precision of 81.6%. Overall, these approaches allowed extraction of temporal trends relating to PHE case numbers, popular locations in relation to the use of face-coverings, and attitudes towards face-coverings, vaccines and the national 'Test and Trace' scheme.
在出现新的健康危机时,没有全国范围内的大规模检测,替代方法可以有效地提供必要的信息,帮助决策者和卫生机构应对大流行。以下工作提出了一种方法,通过该方法可以利用两种主要方法来收集和分析围绕英国 COVID-19 大流行第一波的 Twitter 数据。第一种方法是通过使用总推文量分布来开发原型预警系统,从而调查局部暴发预测。英国各地 COVID-19 相关推文数量和英国公共卫生部 (PHE) 报告的死亡人数之间的上升之间的时间滞后观察到为 6-27 天,与文献中发现的时间滞后值相匹配。为了更好地了解大流行周围人们的讨论主题和态度,第二种方法是进行深入的行为分析,评估公众对政府政策(例如引入口罩)的意见和反应。使用主题建模,在 COVID-19 推文中确定了九个不同的主题,其中主题范围从零售到政府机构。对与口罩相关的推文子集进行情感分析,发现与主要新闻和公告相对应的情感峰值。训练并以半监督的方式应用命名实体识别 (NER) 算法,以识别未标记语料库中包含位置关键字的推文,并达到了 81.6%的精度。总体而言,这些方法允许提取与 PHE 病例数量、与使用口罩相关的热门地点以及对口罩、疫苗和国家“测试和追踪”计划的态度相关的时间趋势。