Wang Hongjian, Yao Huaxiu, Kifer Daniel, Graif Corina, Li Zhenhui
College of Information Sciences and Technology, Pennsylvania State University.
Department of Computer Science and Engineering, Pennsylvania State University.
IEEE Trans Big Data. 2019 Jun;5(2):180-194. doi: 10.1109/TBDATA.2017.2786405. Epub 2017 Dec 22.
Crime is one of the most important social problems in the country, affecting public safety, children development, and adult socioeconomic status. Understanding what factors cause higher crime rate is critical for policy makers in their efforts to reduce crime and increase citizens' life quality. We tackle a fundamental problem in our paper: crime rate inference at the neighborhood level. Traditional approaches have used demographics and geographical influences to estimate crime rates in a region. With the fast development of positioning technology and prevalence of mobile devices, a large amount of modern urban data have been collected and such big data can provide new perspectives for understanding crime. In this paper, we use large-scale Point-Of-Interest data and taxi flow data in the city of Chicago, IL in the USA. We observe significantly improved performance in crime rate inference compared to using traditional features. Such an improvement is consistent over multiple years. We also show that these new features are significant in the feature importance analysis. The correlations between crime and various observed features are not constant over the whole city. In order to address this geospatial non-stationary property, we further employ the geographically weighted regression on top of negative binomial model (GWNBR). Experiments have shown that GWNBR outperforms the negative binomial model.
犯罪是该国最重要的社会问题之一,影响着公共安全、儿童发展和成年人的社会经济地位。了解哪些因素导致更高的犯罪率对于政策制定者努力减少犯罪和提高公民生活质量至关重要。我们在论文中解决了一个基本问题:邻里层面的犯罪率推断。传统方法利用人口统计学和地理影响来估计一个地区的犯罪率。随着定位技术的快速发展和移动设备的普及,大量现代城市数据被收集,这些大数据可以为理解犯罪提供新的视角。在本文中,我们使用了美国伊利诺伊州芝加哥市的大规模兴趣点数据和出租车流量数据。与使用传统特征相比,我们观察到在犯罪率推断方面性能有显著提高。这种提高在多年中是一致的。我们还表明,这些新特征在特征重要性分析中很重要。犯罪与各种观察到的特征之间的相关性在整个城市并不恒定。为了解决这种地理空间非平稳特性,我们在负二项式模型(GWNBR)之上进一步采用地理加权回归。实验表明,GWNBR优于负二项式模型。