Pobiruchin Monika, Zowalla Richard, Wiesner Martin
GECKO Institute for Medicine, Informatics & Economics, Heilbronn University, Heilbronn, Germany.
Consumer Health Informatics SIG, German Association for Medical Informatics, Biometry & Epidemiology (GMDS e. V.), Cologne, Germany.
J Med Internet Res. 2020 Aug 28;22(8):e19629. doi: 10.2196/19629.
The spread of the 2019 novel coronavirus disease, COVID-19, across Asia and Europe sparked a significant increase in public interest and media coverage, including on social media platforms such as Twitter. In this context, the origin of information plays a central role in the dissemination of evidence-based information about the SARS-CoV-2 virus and COVID-19. On February 2, 2020, the World Health Organization (WHO) constituted a "massive infodemic" and argued that this situation "makes it hard for people to find trustworthy sources and reliable guidance when they need it."
This infoveillance study, conducted during the early phase of the COVID-19 pandemic, focuses on the social media platform Twitter. It allows monitoring of the dynamic pandemic situation on a global scale for different aspects and topics, languages, as well as regions and even whole countries. Of particular interest are temporal and geographical variations of COVID-19-related tweets, the situation in Europe, and the categories and origin of shared external resources.
Twitter's Streaming application programming interface was used to filter tweets based on 16 prevalent hashtags related to the COVID-19 outbreak. Each tweet's text and corresponding metadata as well as the user's profile information were extracted and stored into a database. Metadata included links to external resources. A link categorization scheme-introduced in a study by Chew and Eysenbach in 2009-was applied onto the top 250 shared resources to analyze the relative proportion for each category. Moreover, temporal variations of global tweet volumes were analyzed and a specific analysis was conducted for the European region.
Between February 9 and April 11, 2020, a total of 21,755,802 distinct tweets were collected, posted by 4,809,842 distinct Twitter accounts. The volume of #covid19-related tweets increased after the WHO announced the name of the new disease on February 11, 2020, and stabilized at the end of March at a high level. For the regional analysis, a higher tweet volume was observed in the vicinity of major European capitals or in densely populated areas. The most frequently shared resources originated from various social media platforms (ranks 1-7). The most prevalent category in the top 50 was "Mainstream or Local News." For the category "Government or Public Health," only two information sources were found in the top 50: US Centers for Disease Control and Prevention at rank 25 and the WHO at rank 27. The first occurrence of a prevalent scientific source was Nature (rank 116).
The naming of the disease by the WHO was a major signal to address the public audience with public health response via social media platforms such as Twitter. Future studies should focus on the origin and trustworthiness of shared resources, as monitoring the spread of fake news during a pandemic situation is of particular importance. In addition, it would be beneficial to analyze and uncover bot networks spreading COVID-19-related misinformation.
2019年新型冠状病毒病(COVID-19)在亚洲和欧洲的传播引发了公众兴趣和媒体报道的显著增加,包括在推特等社交媒体平台上。在这种背景下,信息来源在传播关于严重急性呼吸综合征冠状病毒2(SARS-CoV-2)病毒和COVID-19的循证信息方面起着核心作用。2020年2月2日,世界卫生组织(WHO)认定这是一场“大规模信息疫情”,并指出这种情况“使得人们在需要时难以找到可靠的信息来源和指导”。
这项信息监测研究在COVID-19大流行的早期阶段进行,聚焦于社交媒体平台推特。它能够在全球范围内监测不同方面、主题、语言、地区甚至整个国家的动态疫情形势。特别令人感兴趣的是与COVID-19相关推文的时间和地理变化、欧洲的情况以及共享外部资源的类别和来源。
利用推特的流式应用程序编程接口,根据16个与COVID-19疫情相关的流行主题标签对推文进行筛选。提取每条推文的文本、相应的元数据以及用户的个人资料信息,并存储到数据库中。元数据包括外部资源的链接。一种在2009年由周和艾森巴赫的一项研究中引入的链接分类方案,被应用于前250个共享资源,以分析每个类别的相对比例。此外,分析了全球推文数量的时间变化,并对欧洲地区进行了具体分析。
在2020年2月9日至4月11日期间,共收集到21,755,802条不同的推文,由4,809,842个不同的推特账户发布。与#covid19相关的推文数量在2020年2月11日WHO宣布新疾病名称后增加,并在3月底稳定在较高水平。对于区域分析,在欧洲主要首都附近或人口密集地区观察到较高的推文数量。最常共享的资源来自各种社交媒体平台(排名1至第7)。前50名中最普遍的类别是“主流或本地新闻”。对于“政府或公共卫生”类别,在前50名中仅发现两个信息来源:美国疾病控制与预防中心排名第25,WHO排名第27。第一个出现的流行科学来源是《自然》(排名116)。
WHO对该疾病的命名是通过推特等社交媒体平台向公众传达公共卫生应对措施的一个重要信号。未来的研究应关注共享资源的来源和可信度,因为在大流行期间监测假新闻的传播尤为重要。此外,分析和揭露传播与COVID-19相关错误信息的机器人网络将是有益的。