ENAC, HERUS Lab, Ecole polytechnique federale de Lausanne, IIE, Lausanne, Switzerland.
University of Turin, Turin, Italy.
PLoS One. 2021 Mar 3;16(3):e0246785. doi: 10.1371/journal.pone.0246785. eCollection 2021.
The availability of reliable socioeconomic data is critical for the design of urban policies and the implementation of location-based services; however, often, their temporal and geographical coverage remain scarce. We explore the potential for insurance customers data to predict socioeconomic indicators of Swiss municipalities. First, we define a features space by aggregating at city-level individual customer data along several behavioral and user profile dimensions. Second, we collect official statistics shared by the Swiss authorities on a wide spectrum of categories: Population, Transportation, Work, Space and Territory, Housing, and Economy. Third, we adopt two spatial regression models exploring both global and local geographical dependencies to investigate their predictability. Results show consistently a correlation between insurance customer characteristics and official socioeconomic indexes. Performance fluctuates depending on the category, with values of R2 > 0.6 for several target variables using a 5-fold cross validation. As a case study, we focus on predicting the percentage of the population using public transportation and we discuss the implications on a regional scope. We believe that this methodology can support official statistical offices and it could open up new opportunities for the characterization of socioeconomic traits at highly-granular spatial and temporal scales.
可靠的社会经济数据对于城市政策的设计和基于位置的服务的实施至关重要;然而,这些数据的时间和地理覆盖范围往往仍然很有限。我们探讨了保险客户数据在预测瑞士城市社会经济指标方面的潜力。首先,我们通过沿着几个行为和用户档案维度聚合个体客户数据来定义一个特征空间。其次,我们收集了瑞士当局在广泛的类别上分享的官方统计数据:人口、交通、工作、空间和领土、住房和经济。第三,我们采用了两种空间回归模型,探索了全局和局部地理依赖关系,以调查它们的可预测性。结果一致地显示了保险客户特征与官方社会经济指标之间的相关性。使用 5 倍交叉验证,性能因类别而异,对于几个目标变量,R2 值>0.6。作为一个案例研究,我们专注于预测使用公共交通的人口比例,并在区域范围内讨论其影响。我们相信,这种方法可以为官方统计机构提供支持,并为在高度细化的空间和时间尺度上刻画社会经济特征开辟新的机会。