Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
School of Architecture and Built Environment, Queensland University of Technology, 2 George Street, Brisbane 4000, QLD, Australia.
Sensors (Basel). 2021 Apr 24;21(9):2993. doi: 10.3390/s21092993.
Digital societies could be characterized by their increasing desire to express themselves and interact with others. This is being realized through digital platforms such as social media that have increasingly become convenient and inexpensive sensors compared to physical sensors in many sectors of smart societies. One such major sector is road transportation, which is the backbone of modern economies and costs globally 1.25 million deaths and 50 million human injuries annually. The cutting-edge on big data-enabled social media analytics for transportation-related studies is limited. This paper brings a range of technologies together to detect road traffic-related events using big data and distributed machine learning. The most specific contribution of this research is an automatic labelling method for machine learning-based traffic-related event detection from Twitter data in the Arabic language. The proposed method has been implemented in a software tool called Iktishaf+ (an Arabic word meaning discovery) that is able to detect traffic events automatically from tweets in the Arabic language using distributed machine learning over Apache Spark. The tool is built using nine components and a range of technologies including Apache Spark, Parquet, and MongoDB. Iktishaf+ uses a light stemmer for the Arabic language developed by us. We also use in this work a location extractor developed by us that allows us to extract and visualize spatio-temporal information about the detected events. The specific data used in this work comprises 33.5 million tweets collected from Saudi Arabia using the Twitter API. Using support vector machines, naïve Bayes, and logistic regression-based classifiers, we are able to detect and validate several real events in Saudi Arabia without prior knowledge, including a fire in Jeddah, rains in Makkah, and an accident in Riyadh. The findings show the effectiveness of Twitter media in detecting important events with no prior knowledge about them.
数字社会的特点是人们越来越渴望表达自我并与他人互动。这一目标正在通过社交媒体等数字平台实现,与智能社会中许多领域的物理传感器相比,社交媒体平台已成为更加便捷和廉价的传感器。道路交通就是这样一个主要领域,它是现代经济的支柱,每年在全球范围内造成 125 万人死亡和 5000 万人受伤。基于大数据的社交媒体分析在交通相关研究中的应用还处于前沿阶段。本文将一系列技术结合起来,使用大数据和分布式机器学习来检测与道路交通相关的事件。这项研究的最独特贡献是提出了一种基于机器学习的交通相关事件检测的自动标记方法,可从阿拉伯语的 Twitter 数据中检测。该方法已在名为 Iktishaf+(阿拉伯语,意为发现)的软件工具中实现,该工具能够使用分布式机器学习在 Apache Spark 上自动从阿拉伯语的推文中检测交通事件。该工具由九个组件和一系列技术构建而成,包括 Apache Spark、Parquet 和 MongoDB。Iktishaf+使用我们开发的轻量级阿拉伯语词干提取器。我们还在这项工作中使用了我们开发的位置提取器,该提取器允许我们提取和可视化检测到的事件的时空信息。这项工作中使用的特定数据包括从沙特阿拉伯使用 Twitter API 收集的 3350 万条推文。我们使用支持向量机、朴素贝叶斯和逻辑回归分类器来检测和验证沙特阿拉伯的几个真实事件,而无需事先了解这些事件,包括吉达的火灾、麦加的降雨和利雅得的事故。研究结果表明,Twitter 媒体在检测重要事件方面具有有效性,而无需事先了解这些事件。