Research Methodology Group, University of Duisburg-Essen, Duisburg, Germany.
Methodology R&D, Statistics Netherlands (CBS), Heerlen, The Netherlands.
Int J Health Geogr. 2021 Mar 20;20(1):14. doi: 10.1186/s12942-021-00268-y.
We introduce and study a recently proposed method for privacy-preserving distance computations which has received little attention in the scientific literature so far. The method, which is based on intersecting sets of randomly labeled grid points, is henceforth denoted as ISGP allows calculating the approximate distances between masked spatial data. Coordinates are replaced by sets of hash values. The method allows the computation of distances between locations L when the locations at different points in time t are not known simultaneously. The distance between [Formula: see text] and [Formula: see text] could be computed even when [Formula: see text] does not exist at [Formula: see text] and [Formula: see text] has been deleted at [Formula: see text]. An example would be patients from a medical data set and locations of later hospitalizations. ISGP is a new tool for privacy-preserving data handling of geo-referenced data sets in general. Furthermore, this technique can be used to include geographical identifiers as additional information for privacy-preserving record-linkage. To show that the technique can be implemented in most high-level programming languages with a few lines of code, a complete implementation within the statistical programming language R is given. The properties of the method are explored using simulations based on large-scale real-world data of hospitals ([Formula: see text]) and residential locations ([Formula: see text]). The method has already been used in a real-world application.
ISGP yields very accurate results. Our simulation study showed that-with appropriately chosen parameters - 99 % accuracy in the approximated distances is achieved.
We discussed a new method for privacy-preserving distance computations in microdata. The method is highly accurate, fast, has low computational burden, and does not require excessive storage.
我们介绍并研究了一种最近提出的隐私保护距离计算方法,该方法在科学文献中迄今为止很少受到关注。该方法基于随机标记网格点的交集,因此表示为 ISGP,允许计算掩蔽空间数据之间的近似距离。坐标被替换为哈希值集。该方法允许在不同时间 t 的位置不同时知道时计算位置 L 之间的距离。即使在 [Formula: see text] 不存在于 [Formula: see text] 且 [Formula: see text] 已在 [Formula: see text] 删除时,也可以计算 [Formula: see text] 和 [Formula: see text] 之间的距离。一个例子是医疗数据集的患者和后来的住院地点。ISGP 是一般地理参考数据集隐私保护数据处理的新工具。此外,该技术可用于将地理标识符作为隐私保护记录链接的附加信息。为了表明该技术可以用几行代码在大多数高级编程语言中实现,给出了在统计编程语言 R 中的完整实现。使用基于医院([Formula: see text])和住宅位置([Formula: see text])的大规模真实世界数据的模拟来探索该方法的特性。该方法已在实际应用中使用。
ISGP 产生非常准确的结果。我们的模拟研究表明,通过选择适当的参数,可以实现近似距离的 99%的准确率。
我们讨论了一种新的微数据隐私保护距离计算方法。该方法准确性高、速度快、计算负担低,且不需要过多的存储空间。