Buck Christoph, Dreger Steffen, Pigeot Iris
Leibniz Institute for Prevention Research and Epidemiology-BIPS, Bremen, Germany.
BMJ Open. 2015 Mar 9;5(3):e006481. doi: 10.1136/bmjopen-2014-006481.
Data privacy is a major concern in spatial epidemiology because exact residential locations or parts of participants' addresses such as street or zip codes are used to perform geospatial analyses. To overcome this concern, different levels of aggregation such as census districts or zip code areas are mainly used, though any spatial aggregation leads to a loss of spatial variability. For the assessment of urban opportunities for physical activity that was conducted in the IDEFICS (Identification and prevention of dietary- and lifestyle-induced health effects in children and infants) study, macrolevel analyses were performed, but the use of exact residential addresses for micro-level analyses was not permitted by the responsible office for data protection. We therefore implemented a spatial blurring to anonymise address coordinates depending on the underlying population density.
We added a standard Gaussian distributed error to individual address coordinates with the variance σ² depending on the population density and on the chosen k-anonymity. 1000 random point locations were generated and repeatedly blurred 100 times to obtain anonymised locations. For each location 1 km network-dependent neighbourhoods were used to calculate walkability indices. Indices of blurred locations were compared to indices based on their sampling origins to determine the effect of spatial blurring on the assessment of the built environment.
Spatial blurring decreased with increasing population density. Similarly, mean differences in walkability indices also decreased with increasing population density. In particular for densely-populated areas with at least 1500 residents per km², differences between blurred locations and their sampling origins were small and did not affect the assessment of the built environment after spatial blurring.
This approach allowed the investigation of the built environment at a microlevel using individual network-dependent neighbourhoods, while ensuring data protection requirements. Minor influence of spatial blurring on the assessment of walkability was found that slightly affected the assessment of the built environment in sparsely-populated areas.
数据隐私是空间流行病学中的一个主要问题,因为在进行地理空间分析时会使用参与者的确切居住位置或地址的部分信息,如街道或邮政编码。为了克服这一问题,主要采用不同层次的聚合方式,如普查区或邮政编码区域,不过任何空间聚合都会导致空间变异性的损失。在IDEFICS(识别和预防儿童及婴儿饮食和生活方式引起的健康影响)研究中进行的城市体育活动机会评估采用了宏观层面的分析,但数据保护责任部门不允许在微观层面分析中使用确切的居住地址。因此,我们实施了空间模糊处理,根据潜在人口密度对地址坐标进行匿名化处理。
我们向单个地址坐标添加了一个服从标准高斯分布的误差,其方差σ²取决于人口密度和所选的k匿名性。生成了1000个随机点位置,并重复模糊处理100次以获得匿名位置。对于每个位置,使用1公里的网络相关邻域来计算步行适宜性指数。将模糊位置的指数与其采样源的指数进行比较,以确定空间模糊对建成环境评估的影响。
空间模糊程度随着人口密度的增加而降低。同样,步行适宜性指数的平均差异也随着人口密度的增加而减小。特别是对于每平方公里至少有1500名居民的人口密集地区,模糊位置与其采样源之间的差异很小,并且在空间模糊处理后不影响对建成环境的评估。
这种方法允许在确保数据保护要求的同时,使用与网络相关的个体邻域在微观层面研究建成环境。发现空间模糊对步行适宜性评估的影响较小,在人口稀疏地区对建成环境评估有轻微影响。