Tamersoy Acar, Loukides Grigorios, Denny Joshua C, Malin Bradley
Department of Biomedical Informatics, School of Medicine Vanderbilt University, Nashville, Tennessee.
AMIA Annu Symp Proc. 2010 Nov 13;2010:782-6.
Patient-specific data from electronic medical records (EMRs) is increasingly shared in a de-identified form to support research. However, EMRs are susceptible to noise, error, and variation, which can limit their utility for reuse. One way to enhance the utility of EMRs is to record the number of times diagnosis codes are assigned to a patient when this data is shared. This is, however, challenging because releasing such data may be leveraged to compromise patients' identity. In this paper, we present an approach that, to the best of our knowledge, is the first that can prevent re-identification through repeated diagnosis codes. Our method transforms records to preserve privacy while retaining much of their utility. Experiments conducted using 2676 patients from the EMR system of the Vanderbilt University Medical Center verify that our method is able to retain an average of 95.4% of the diagnosis codes in a common data sharing scenario.
来自电子病历(EMR)的患者特定数据越来越多地以去标识化的形式共享,以支持研究。然而,电子病历容易受到噪声、错误和变异的影响,这可能会限制其重复使用的效用。提高电子病历效用的一种方法是在共享此数据时记录分配给患者的诊断代码的次数。然而,这具有挑战性,因为发布此类数据可能会被用于泄露患者身份。在本文中,据我们所知,我们提出了一种能够通过重复的诊断代码防止重新识别的方法。我们的方法在保留记录大部分效用的同时对其进行转换以保护隐私。使用范德比尔特大学医学中心电子病历系统的2676名患者进行的实验证实,在常见的数据共享场景中,我们的方法能够平均保留95.4%的诊断代码。