Parraga-Alava Jorge, Alcivar-Cevallos Roberth, Vaca-Cardenas Leticia, Meza Jaime
Facultad de Ciencias Informáticas, Universidad Técnica de Manabí, Avenida Jose María Urbina, Portoviejo 130104, Ecuador.
Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago 9160000, Chile.
Data Brief. 2020 Dec 24;34:106693. doi: 10.1016/j.dib.2020.106693. eCollection 2021 Feb.
Recently, the use of the citizen-sensors (people generating and sharing real data by social media) for detecting and disseminating emergency events in real-time have shown a considerable increase because people at the place of the event, as well as elsewhere, can quickly post relevant information on this type of alerts. Here, we present an emergency events dataset called . The dataset contains over 25500 texts in Spanish posted on Twitter from January 19th to August 19th, 2020, with emergencies and non-emergencies related content in Ecuador. We obtained, cleaned and, filtered these tweets and, then we selected the location and temporal data as well as tweet content. Besides, the data set includes annotations regarding the type of tweet (emergency / non-emergency) as well as additional nomenclature used to describe emergencies in the Center for immediate response service to emergencies (ECU 911) of Ecuador and international emergency services agencies (ESAs). dataset facilitates evaluating data science performance, machine learning, and natural language processing algorithms used with supervised and unsupervised problems re- related to text mining and pattern recognition. The dataset is freely and publicly available at https://doi.org/10.17632/4x37zz82k8.
最近,公民传感器(即通过社交媒体生成并分享真实数据的人)用于实时检测和传播紧急事件的情况显著增加,因为事件发生地以及其他地方的人们能够迅速发布有关此类警报的相关信息。在此,我们展示一个名为 的紧急事件数据集。该数据集包含2020年1月19日至8月19日期间在推特上发布的超过25500条西班牙语推文,内容涉及厄瓜多尔的紧急情况和非紧急情况。我们获取、清理并筛选了这些推文,然后选取了位置和时间数据以及推文内容。此外,该数据集包括有关推文类型(紧急/非紧急)的注释,以及用于描述厄瓜多尔紧急情况即时响应服务中心(ECU 911)和国际紧急服务机构(ESA)中紧急情况的其他术语。该数据集便于评估与文本挖掘和模式识别相关的监督和无监督问题中所使用的数据科学性能、机器学习和自然语言处理算法。该数据集可在https://doi.org/10.17632/4x37zz82k8上免费公开获取。