Population Informatics Research Group, Department of Computer Science, UNC-CH & Department of Health Policy and Management, Texas A&M Health Science Center, USA.
J Am Med Inform Assoc. 2014 Mar-Apr;21(2):212-20. doi: 10.1136/amiajnl-2013-002165. Epub 2013 Nov 7.
Record linkage to integrate uncoordinated databases is critical in biomedical research using Big Data. Balancing privacy protection against the need for high quality record linkage requires a human-machine hybrid system to safely manage uncertainty in the ever changing streams of chaotic Big Data.
In the computer science literature, private record linkage is the most published area. It investigates how to apply a known linkage function safely when linking two tables. However, in practice, the linkage function is rarely known. Thus, there are many data linkage centers whose main role is to be the trusted third party to determine the linkage function manually and link data for research via a master population list for a designated region. Recently, a more flexible computerized third-party linkage platform, Secure Decoupled Linkage (SDLink), has been proposed based on: (1) decoupling data via encryption, (2) obfuscation via chaffing (adding fake data) and universe manipulation; and (3) minimum information disclosure via recoding.
We synthesize this literature to formalize a new framework for privacy preserving interactive record linkage (PPIRL) with tractable privacy and utility properties and then analyze the literature using this framework.
Human-based third-party linkage centers for privacy preserving record linkage are the accepted norm internationally. We find that a computer-based third-party platform that can precisely control the information disclosed at the micro level and allow frequent human interaction during the linkage process, is an effective human-machine hybrid system that significantly improves on the linkage center model both in terms of privacy and utility.
在使用大数据的生物医学研究中,通过记录链接来整合不协调的数据库至关重要。在平衡隐私保护和高质量记录链接的需求时,需要一个人机混合系统来安全地管理不断变化的混沌大数据流中的不确定性。
在计算机科学文献中,隐私记录链接是最常被发表的领域。它研究了当链接两个表时,如何安全地应用已知的链接函数。然而,在实践中,链接函数很少被知晓。因此,有许多数据链接中心,其主要作用是作为可信的第三方,通过主人口列表手动确定链接函数,并链接数据用于指定区域的研究。最近,基于以下方法,提出了一种更灵活的计算机化第三方链接平台 Secure Decoupled Linkage (SDLink):(1) 通过加密实现数据解耦,(2) 通过添加虚假数据(混淆)和宇宙操作进行混淆;以及 (3) 通过重新编码实现最小信息披露。
我们综合了这些文献,正式提出了一个具有可处理隐私和效用特性的隐私保护交互式记录链接 (PPIRL) 的新框架,然后使用该框架分析文献。
基于人类的第三方链接中心是隐私保护记录链接的国际公认规范。我们发现,基于计算机的第三方平台可以精确控制微观层面上披露的信息,并允许在链接过程中频繁进行人工交互,是一种有效的人机混合系统,在隐私和效用方面都显著优于链接中心模型。