Aguilar-Gallegos Norman, Romero-García Leticia Elizabeth, Martínez-González Enrique Genaro, García-Sánchez Edgar Iván, Aguilar-Ávila Jorge
Centro de Investigaciones Económicas, Sociales y Tecnológicas de la Agroindustria y la Agricultura Mundial (CIESTAAM), Universidad Autónoma Chapingo (UACh), Chapingo, Estado de México, México.
Universidad Autónoma del Estado de México (UAEM), Estado de México, México.
Data Brief. 2020 May 8;30:105684. doi: 10.1016/j.dib.2020.105684. eCollection 2020 Jun.
In this data article, we provide a dataset of 8,982,694 Twitter posts around the coronavirus health global crisis. The data were collected through the Twitter REST API search. We used the rtweet R package to download raw data. The term searched was "Coronavirus" which included the word itself and its hashtag version. We collected the data over 23 days, from January 21 to February 12, 2020. The dataset is multilingual, prevailing English, Spanish, and Portuguese. We include a new variable created from other four variables; it is called "type" of tweets, which is useful for showing the diversity of tweets and the dynamics of users on Twitter. The dataset comprises seven databases which can be analysed separately. On the other hand, they can be crossed to set other researches, among them, trends and relevance of different topics, types of tweets, the embeddedness of users and their profiles, the retweets dynamics, hashtag analysis, as well as to perform social network analysis. This dataset can attract the attention of researchers related to different fields on knowledge, such as data science, social science, network science, health informatics, tourism, infodemiology, and others.
在这篇数据文章中,我们提供了一个围绕冠状病毒全球健康危机的8982694条推特帖子的数据集。这些数据是通过推特REST API搜索收集的。我们使用rtweet R包下载原始数据。搜索的术语是“Coronavirus”,包括该词本身及其标签版本。我们在2020年1月21日至2月12日的23天内收集了这些数据。该数据集是多语言的,主要语言为英语、西班牙语和葡萄牙语。我们从其他四个变量创建了一个新变量;它被称为推文的“类型”,这对于展示推文的多样性以及推特上用户的动态很有用。该数据集由七个数据库组成,可以分别进行分析。另一方面,它们可以交叉组合以开展其他研究,其中包括不同主题的趋势和相关性、推文类型、用户及其个人资料的嵌入情况、转发动态、标签分析,以及进行社会网络分析。这个数据集可以吸引不同知识领域的研究人员的关注,如数据科学、社会科学、网络科学、健康信息学、旅游、信息流行病学等。