Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee 37203, USA.
J Am Med Inform Assoc. 2011 Jan-Feb;18(1):3-10. doi: 10.1136/jamia.2010.004622.
Healthcare organizations must de-identify patient records before sharing data. Many organizations rely on the Safe Harbor Standard of the HIPAA Privacy Rule, which enumerates 18 identifiers that must be suppressed (eg, ages over 89). An alternative model in the Privacy Rule, known as the Statistical Standard, can facilitate the sharing of more detailed data, but is rarely applied because of a lack of published methodologies. The authors propose an intuitive approach to de-identifying patient demographics in accordance with the Statistical Standard.
The authors conduct an analysis of the demographics of patient cohorts in five medical centers developed for the NIH-sponsored Electronic Medical Records and Genomics network, with respect to the US census. They report the re-identification risk of patient demographics disclosed according to the Safe Harbor policy and the relative risk rate for sharing such information via alternative policies.
The re-identification risk of Safe Harbor demographics ranged from 0.01% to 0.19%. The findings show alternative de-identification models can be created with risks no greater than Safe Harbor. The authors illustrate that the disclosure of patient ages over the age of 89 is possible when other features are reduced in granularity.
The de-identification approach described in this paper was evaluated with demographic data only and should be evaluated with other potential identifiers.
Alternative de-identification policies to the Safe Harbor model can be derived for patient demographics to enable the disclosure of values that were previously suppressed. The method is generalizable to any environment in which population statistics are available.
医疗保健组织在共享数据之前必须对患者记录进行去识别。许多组织依赖 HIPAA 隐私规则的安全港标准,该标准列举了必须抑制的 18 个标识符(例如,年龄超过 89 岁)。隐私规则中的替代模型,称为统计标准,可以促进更详细数据的共享,但由于缺乏已发布的方法,很少应用。作者提出了一种符合统计标准的直观方法来对患者人口统计学信息进行去识别。
作者对五个医疗中心的 NIH 赞助的电子病历和基因组网络开发的患者队列的人口统计学进行了分析,涉及到美国人口普查。他们报告了根据安全港政策披露患者人口统计学信息的重新识别风险,以及通过替代政策共享此类信息的相对风险率。
安全港人口统计学的重新识别风险从 0.01%到 0.19%不等。研究结果表明,可以创建风险不高于安全港的替代去识别模型。作者说明,当其他特征的粒度降低时,可以披露年龄超过 89 岁的患者的年龄。
本文描述的去识别方法仅使用人口统计学数据进行了评估,应使用其他潜在标识符进行评估。
可以为患者人口统计学信息制定替代安全港模型的去识别策略,以披露以前被抑制的值。该方法适用于任何可以获得人口统计数据的环境。