Development Impact Evaluation Department, World Bank, Washington, DC, United States of America.
School of Architecture and Planning, Massachusetts Institute of Technology, Cambridge, MA, United States of America.
PLoS One. 2021 Feb 3;16(2):e0244317. doi: 10.1371/journal.pone.0244317. eCollection 2021.
With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. What makes this frustrating is that private companies hold potentially useful data, but it is not accessible by the people who can use it to track poverty, reduce disease, or build urban infrastructure. This project set out to test whether we can transform an openly available dataset (Twitter) into a resource for urban planning and development. We test our hypothesis by creating road traffic crash location data, which is scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over five and young adults. The research project scraped 874,588 traffic related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. We geolocate 32,991 crash reports in Twitter for 2012-2020 and cluster them into 22,872 unique crashes during this period. For a subset of crashes reported on Twitter, a motorcycle delivery service was dispatched in real-time to verify the crash and its location; the results show 92% accuracy. To our knowledge this is the first geolocated dataset of crashes for the city and allowed us to produce the first crash map for Nairobi. Using a spatial clustering algorithm, we are able to locate portions of the road network (<1%) where 50% of the crashes identified occurred. Even with limitations in the representativeness of the data, the results can provide urban planners with useful information that can be used to target road safety improvements where resources are limited. The work shows how twitter data might be used to create other types of essential data for urban planning in resource poor environments.
由于最近所有的注意力都集中在大数据上,很容易忽视在世界上大多数地方基本生命统计数据仍然难以获得。令人沮丧的是,私营公司拥有潜在有用的数据,但那些可以利用这些数据来跟踪贫困、减少疾病或建设城市基础设施的人却无法获取这些数据。该项目旨在测试我们是否可以将公开可用的数据集(Twitter)转化为城市规划和发展的资源。我们通过创建道路交通碰撞地点数据来检验我们的假设,在资源匮乏的环境中,这种数据非常稀缺,但对于解决 5 岁以上儿童和年轻成年人的头号死因至关重要。该研究项目在肯尼亚内罗毕爬取了 874588 条与交通相关的推文,应用机器学习模型来捕捉碰撞的发生,并开发了改进的地理解析算法来识别其位置。我们在 Twitter 上定位了 2012 年至 2020 年间 32991 份碰撞报告,并将其聚类为 22872 个独特的碰撞。对于在 Twitter 上报告的一部分碰撞,摩托车快递服务会实时派遣以核实碰撞及其位置;结果显示准确率为 92%。据我们所知,这是该市首个地理定位的碰撞数据集,使我们能够制作内罗毕的首张碰撞地图。使用空间聚类算法,我们能够定位到网络中 (<1%) 50%的碰撞发生的部分路段。即使数据的代表性存在局限性,这些结果也可以为城市规划者提供有用的信息,以便在资源有限的情况下有针对性地改善道路安全。该研究展示了如何利用 Twitter 数据在资源匮乏的环境中创建其他类型的城市规划所需的数据。