Lyu Hanjia, Chen Long, Wang Yu, Luo Jiebo
Goergen Institute for Data ScienceUniversity of Rochester Rochester NY 14627 USA.
Department of Computer ScienceUniversity of Rochester Rochester NY 14627 USA.
IEEE Trans Big Data. 2020 May 21;7(6):952-960. doi: 10.1109/TBDATA.2020.2996401. eCollection 2021 Dec.
With the world-wide development of 2019 novel coronavirus, although WHO has officially announced the disease as COVID-19, one controversial term - "Chinese Virus" is still being used by a great number of people. In the meantime, global online media coverage about COVID-19-related racial attacks increases steadily, most of which are anti-Chinese or anti-Asian. As this pandemic becomes increasingly severe, more people start to talk about it on social media platforms such as Twitter. When they refer to COVID-19, there are mainly two ways: using controversial terms like "Chinese Virus" or "Wuhan Virus", or using non-controversial terms like "Coronavirus". In this article, we attempt to characterize the Twitter users who use controversial terms and those who use non-controversial terms. We use the Tweepy API to retrieve 17 million related tweets and the information of their authors. We find the significant differences between these two groups of Twitter users across their demographics, user-level features like the number of followers, political following status, as well as their geo-locations. Moreover, we apply classification models to predict Twitter users who are more likely to use controversial terms. To our best knowledge, this is the first large-scale social media-based study to characterize users with respect to their usage of controversial terms during a major crisis.
随着2019新型冠状病毒在全球范围内的传播,尽管世界卫生组织已正式将该疾病命名为COVID-19,但一个有争议的词汇——“中国病毒”仍被许多人使用。与此同时,全球在线媒体对与COVID-19相关的种族攻击的报道稳步增加,其中大部分是反华或反亚裔的。随着这场大流行病日益严重,越来越多的人开始在推特等社交媒体平台上谈论它。当他们提及COVID-19时,主要有两种方式:使用有争议的词汇,如“中国病毒”或“武汉病毒”,或使用无争议的词汇,如“冠状病毒”。在本文中,我们试图描述使用有争议词汇的推特用户和使用无争议词汇的推特用户的特征。我们使用Tweepy应用程序编程接口来检索1700万条相关推文及其作者的信息。我们发现这两组推特用户在人口统计学特征、用户层面的特征(如关注者数量、政治关注状态)以及地理位置方面存在显著差异。此外,我们应用分类模型来预测更有可能使用有争议词汇的推特用户。据我们所知,这是第一项基于大规模社交媒体的研究,旨在描述在重大危机期间使用有争议词汇的用户特征。