Yin Ling, Wang Qian, Shaw Shih-Lung, Fang Zhixiang, Hu Jinxing, Tao Ye, Wang Wei
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China.
Department of Geography, The University of Tennessee, Knoxville, TN, United States of America; State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan, Hubei, China.
PLoS One. 2015 Oct 15;10(10):e0140589. doi: 10.1371/journal.pone.0140589. eCollection 2015.
Mobile phone location data is a newly emerging data source of great potential to support human mobility research. However, recent studies have indicated that many users can be easily re-identified based on their unique activity patterns. Privacy protection procedures will usually change the original data and cause a loss of data utility for analysis purposes. Therefore, the need for detailed data for activity analysis while avoiding potential privacy risks presents a challenge. The aim of this study is to reveal the re-identification risks from a Chinese city's mobile users and to examine the quantitative relationship between re-identification risk and data utility for an aggregated mobility analysis. The first step is to apply two reported attack models, the top N locations and the spatio-temporal points, to evaluate the re-identification risks in Shenzhen City, a metropolis in China. A spatial generalization approach to protecting privacy is then proposed and implemented, and spatially aggregated analysis is used to assess the loss of data utility after privacy protection. The results demonstrate that the re-identification risks in Shenzhen City are clearly different from those in regions reported in Western countries, which prove the spatial heterogeneity of re-identification risks in mobile phone location data. A uniform mathematical relationship has also been found between re-identification risk (x) and data (y) utility for both attack models: y = -axb+c, (a, b, c>0; 0<x<1), where the exponent b increases with the background knowledge of the attackers. The discovered mathematical relationship provides data publishers with useful guidance on choosing the right tradeoff between privacy and utility. Overall, this study contributes to a better understanding of re-identification risks and a privacy-utility tradeoff benchmark for improving privacy protection when sharing detailed trajectory data.
移动电话位置数据是一种新兴的具有巨大潜力的数据源,可用于支持人类移动性研究。然而,最近的研究表明,许多用户可根据其独特的活动模式被轻易地重新识别。隐私保护程序通常会改变原始数据,并导致用于分析目的的数据效用损失。因此,在避免潜在隐私风险的同时需要详细数据进行活动分析,这带来了挑战。本研究的目的是揭示中国一个城市移动用户的重新识别风险,并检验重新识别风险与用于汇总移动性分析的数据效用之间的定量关系。第一步是应用两种已报道的攻击模型,即前N个位置模型和时空点模型,来评估中国大都市深圳市的重新识别风险。然后提出并实施一种保护隐私的空间泛化方法,并使用空间汇总分析来评估隐私保护后的数据效用损失。结果表明,深圳市的重新识别风险与西方国家报道的地区明显不同,这证明了移动电话位置数据中重新识别风险的空间异质性。对于这两种攻击模型,还发现重新识别风险(x)与数据(y)效用之间存在统一的数学关系:y = -ax^b + c,(a、b、c>0;0<x<1),其中指数b随着攻击者的背景知识增加。所发现的数学关系为数据发布者在隐私与效用之间进行正确权衡提供了有用指导。总体而言,本研究有助于更好地理解重新识别风险,并为在共享详细轨迹数据时改进隐私保护提供了一个隐私 - 效用权衡基准。