Sweeney L
Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, USA.
Proc AMIA Annu Fall Symp. 1997:51-5.
We present a computer program named Datafly that maintains anonymity in medical data by automatically generalizing, substituting, and removing information as appropriate without losing many of the details found within the data. Decisions are made at the field and record level at the time of database access, so the approach can be used on the fly in role-based security within an institution, and in batch mode for exporting data from an institution. Often organizations release and receive medical data with all explicit identifiers, such as name, address and phone number, removed in the incorrect belief that patient confidentiality is maintained because the resulting data look anonymous; however, we show the remaining data can often be used to re-identify individuals by linking or matching the data to other databases or by looking at unique characteristics found in the fields and records of the database itself. When these less apparent aspects are taken into account, each released record can be made to ambiguously map to many possible people, providing a level of anonymity determined by the user.
我们展示了一个名为Datafly的计算机程序,它通过在不丢失数据中许多细节的情况下,自动适当地进行数据泛化、替换和删除信息,从而在医学数据中保持匿名性。在数据库访问时,在字段和记录级别做出决策,因此该方法可在机构内基于角色的安全环境中即时使用,也可在批量模式下用于从机构导出数据。通常,组织在发布和接收医疗数据时,会错误地认为删除所有显式标识符(如姓名、地址和电话号码)就能维护患者隐私,因为得到的数据看起来是匿名的;然而,我们表明,剩余的数据通常可以通过将其与其他数据库链接或匹配,或者查看数据库本身字段和记录中发现的独特特征,来重新识别个人。当考虑到这些不太明显的方面时,每个发布的记录都可以模糊地映射到许多可能的人,从而提供由用户确定的匿名级别。