O'Neill Liam, Dexter Franklin, Zhang Nan
From the *Department of Health Management and Policy, School of Public Health, University of North Texas-Health Science Center, Fort Worth, Texas; †Division of Management Consulting, Department of Anesthesia, University of Iowa, Iowa City, Iowa; and ‡Department of Computer Science, George Washington University, Washington, DC.
Anesth Analg. 2016 Jun;122(6):2017-27. doi: 10.1213/ANE.0000000000001331.
In this article, we consider the privacy implications of posting data from small, randomized trials, observational studies, or case series in anesthesia from a few (e.g., 1-3) hospitals. Prior to publishing such data as supplemental digital content, the authors remove attributes that could be used to re-identify individuals, a process known as "anonymization." Posting health information that has been properly "de-identified" is assumed to pose no risks to patient privacy. Yet, computer scientists have demonstrated that this assumption is flawed. We consider various realistic scenarios of how the publication of such data could lead to breaches of patient privacy. Several examples of successful privacy attacks are reviewed, as well as the methods used. We survey the latest models and methods from computer science for protecting health information and their application to posting data from small anesthesia studies. To illustrate the vulnerability of such published data, we calculate the "population uniqueness" for patients undergoing one or more surgical procedures using data from the State of Texas. For a patient selected uniformly at random, the probability that an adversary could match this patient's record to a unique record in the state external database was 42.8% (SE < 0.1%). Despite the 42.8% being an unacceptably high level of risk, it underestimates the risk for patients from smaller states or provinces. We propose an editorial policy that greatly reduces the likelihood of a privacy breach, while supporting the goal of transparency of the research process.
在本文中,我们探讨了在几家(例如1 - 3家)医院发布麻醉领域小型随机试验、观察性研究或病例系列数据所涉及的隐私问题。在将此类数据作为补充数字内容发布之前,作者会去除那些可用于重新识别个体的属性,这一过程称为“匿名化”。发布经过适当“去识别化”的健康信息被认为不会对患者隐私构成风险。然而,计算机科学家已经证明这一假设存在缺陷。我们考虑了此类数据发布可能导致患者隐私泄露的各种现实场景。回顾了一些成功的隐私攻击案例以及所使用的方法。我们调查了计算机科学领域用于保护健康信息的最新模型和方法,以及它们在发布小型麻醉研究数据方面的应用。为了说明此类已发布数据的脆弱性,我们使用得克萨斯州的数据计算了接受一项或多项外科手术患者的“人群唯一性”。对于随机均匀选择的一名患者,对手能够将该患者的记录与该州外部数据库中的唯一记录匹配的概率为42.8%(标准误差<0.1%)。尽管42.8%是一个高得不可接受的风险水平,但它低估了来自较小州或省份患者的风险。我们提出了一项编辑政策,该政策在支持研究过程透明度目标的同时,极大地降低了隐私泄露的可能性。