Bappee Fateha Khanam, Soares Amilcar, Petry Lucas May, Matwin Stan
Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia Canada.
Department of Computer Science, Memorial University of Newfoundland, St. John's, Canada.
J Big Data. 2021;8(1):96. doi: 10.1186/s40537-021-00489-9. Epub 2021 Jul 3.
Nowadays, urban data such as demographics, infrastructure, and criminal records are becoming more accessible to researchers. This has led to improvements in quantitative crime research for predicting future crime occurrence by identifying factors and knowledge from instances that contribute to criminal activities. While crime distribution in the geographic space is asymmetric, there are often analog, implicit criminogenic factors hidden in the data. And, since the data are not as available or comprehensive, especially for smaller cities, it is challenging to build a uniform framework for all geographic regions. This paper addresses the crime prediction task from a cross-domain perspective to tackle the data insufficiency problem in a small city. We create a uniform outline for Halifax, Nova Scotia, one of Canada's geographic regions, by adapting and learning knowledge from two different domains, Toronto and Vancouver, which belong to different but related distributions with Halifax. For transferring knowledge among source and target domains, we propose applying instance-based transfer learning settings. Each setting is directed to learning knowledge based on a seasonal perspective with cross-domain data fusion. We choose ensemble learning methods for model building as it has generalization capabilities over new data. We evaluate the classification performance for both single and multi-domain representations and compare the results with baseline models. Our findings exhibit the satisfactory performance of our proposed data-driven approach by integrating multiple sources of data.
如今,研究人员越来越容易获取人口统计、基础设施和犯罪记录等城市数据。这使得通过识别促成犯罪活动的因素和实例中的知识来预测未来犯罪发生的定量犯罪研究得到了改进。虽然地理空间中的犯罪分布是不对称的,但数据中往往隐藏着类似的、隐含的犯罪成因因素。而且,由于数据并非同样可得或全面,尤其是对于较小的城市而言,为所有地理区域构建一个统一的框架具有挑战性。本文从跨领域的角度解决犯罪预测任务,以应对小城市的数据不足问题。我们通过从多伦多和温哥华这两个与哈利法克斯属于不同但相关分布的不同领域中适配和学习知识,为加拿大地理区域之一的新斯科舍省哈利法克斯创建了一个统一的轮廓。为了在源域和目标域之间转移知识,我们提出应用基于实例的迁移学习设置。每个设置都旨在基于跨域数据融合的季节性视角学习知识。我们选择集成学习方法进行模型构建,因为它对新数据具有泛化能力。我们评估单域和多域表示的分类性能,并将结果与基线模型进行比较。我们的研究结果表明,通过整合多源数据,我们提出的数据驱动方法具有令人满意的性能。