Hripcsak George, Mirhaji Parsa, Low Alexander Fh, Malin Bradley A
Department of Biomedical Informatics, Columbia University Medical Center, New York, NY 10032, USA
Montefiore Medical Center/Albert Einstein College of Medicine, Bronx, New York, NY 10461, USA.
J Am Med Inform Assoc. 2016 Nov;23(6):1040-1045. doi: 10.1093/jamia/ocw001. Epub 2016 Mar 24.
Maintaining patient privacy is a challenge in large-scale observational research. To assist in reducing the risk of identifying study subjects through publicly available data, we introduce a method for obscuring date information for clinical events and patient characteristics.
The method, which we call Shift and Truncate (SANT), obscures date information to any desired granularity. Shift and Truncate first assigns each patient a random shift value, such that all dates in that patient's record are shifted by that amount. Data are then truncated from the beginning and end of the data set.
The data set can be proven to not disclose temporal information finer than the chosen granularity. Unlike previous strategies such as a simple shift, it remains robust to frequent - even daily - updates and robust to inferring dates at the beginning and end of date-shifted data sets. Time-of-day may be retained or obscured, depending on the goal and anticipated knowledge of the data recipient.
The method can be useful as a scientific approach for reducing re-identification risk under the Privacy Rule of the Health Insurance Portability and Accountability Act and may contribute to qualification for the Safe Harbor implementation.
在大规模观察性研究中,维护患者隐私是一项挑战。为了帮助降低通过公开可用数据识别研究对象的风险,我们引入了一种用于模糊临床事件和患者特征日期信息的方法。
我们称之为“移位与截断”(SANT)的方法可将日期信息模糊到任何所需的粒度。“移位与截断”首先为每个患者分配一个随机移位值,使得该患者记录中的所有日期都偏移该数量。然后从数据集的开头和结尾截断数据。
可以证明该数据集不会泄露比所选粒度更精细的时间信息。与简单移位等先前策略不同,它对于频繁(甚至每日)更新具有鲁棒性,并且对于推断日期移位数据集开头和结尾的日期也具有鲁棒性。根据目标和数据接收者的预期知识,一天中的时间可以保留或模糊。
该方法作为一种科学方法,可用于根据《健康保险流通与责任法案》的隐私规则降低重新识别风险,并可能有助于符合安全港实施的条件。