Mohanty Somya D, Biggers Brown, Sayedahmed Saed, Pourebrahim Nastaran, Goldstein Evan B, Bunch Rick, Chi Guangqing, Sadri Fereidoon, McCoy Tom P, Cosby Arthur
Department of Computer Science, University of North Carolina - Greensboro.
Department of Geography, Environment, and Sustainability, University of North Carolina - Greensboro.
Int J Disaster Risk Reduct. 2021 Feb 15;54. doi: 10.1016/j.ijdrr.2020.102032. Epub 2021 Jan 11.
Streaming social media provides a real-time glimpse of extreme weather impacts. However, the volume of streaming data makes mining information a challenge for emergency managers, policy makers, and disciplinary scientists. Here we explore the effectiveness of data learned approaches to mine and filter information from streaming social media data from Hurricane Irma's landfall in Florida, USA. We use 54,383 Twitter messages (out of 784K geolocated messages) from 16,598 users from Sept. 10 - 12, 2017 to develop 4 independent models to filter data for relevance: 1) a geospatial model based on forcing conditions at the place and time of each tweet, 2) an image classification model for tweets that include images, 3) a user model to predict the reliability of the tweeter, and 4) a text model to determine if the text is related to Hurricane Irma. All four models are independently tested, and can be combined to quickly filter and visualize tweets based on user-defined thresholds for each submodel. We envision that this type of filtering and visualization routine can be useful as a base model for data capture from noisy sources such as Twitter. The data can then be subsequently used by policy makers, environmental managers, emergency managers, and domain scientists interested in finding tweets with specific attributes to use during different stages of the disaster (e.g., preparedness, response, and recovery), or for detailed research.
流媒体社交媒体提供了极端天气影响的实时景象。然而,流媒体数据的数量使得挖掘信息对应急管理人员、政策制定者和各学科科学家来说是一项挑战。在此,我们探讨数据学习方法从美国佛罗里达州飓风艾尔玛登陆时的流媒体社交媒体数据中挖掘和筛选信息的有效性。我们使用了2017年9月10日至12日来自16598名用户的54383条推特消息(在78.4万条地理定位消息中)来开发4个独立模型以筛选相关数据:1)基于每条推文发布地点和时间的强迫条件的地理空间模型;2)针对包含图像的推文的图像分类模型;3)预测推特用户可靠性的用户模型;4)确定文本是否与飓风艾尔玛相关的文本模型。所有四个模型都进行了独立测试,并且可以根据每个子模型的用户定义阈值进行组合,以快速筛选和可视化推文。我们设想这种筛选和可视化程序可以作为从推特等嘈杂源捕获数据的基础模型。然后,政策制定者、环境管理人员、应急管理人员以及对在灾难不同阶段(例如准备、应对和恢复)寻找具有特定属性的推文感兴趣的领域科学家,或进行详细研究的人员可以随后使用这些数据。