Canim Mustafa, Kantarcioglu Murat, Malin Bradley
Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083, USA.
IEEE Trans Inf Technol Biomed. 2012 Jan;16(1):166-75. doi: 10.1109/TITB.2011.2171701. Epub 2011 Oct 17.
The biomedical community is increasingly migrating toward research endeavors that are dependent on large quantities of genomic and clinical data. At the same time, various regulations require that such data be shared beyond the initial collecting organization (e.g., an academic medical center). It is of critical importance to ensure that when such data are shared, as well as managed, it is done so in a manner that upholds the privacy of the corresponding individuals and the overall security of the system. In general, organizations have attempted to achieve these goals through deidentification methods that remove explicitly, and potentially, identifying features (e.g., names, dates, and geocodes). However, a growing number of studies demonstrate that deidentified data can be reidentified to named individuals using simple automated methods. As an alternative, it was shown that biomedical data could be shared, managed, and analyzed through practical cryptographic protocols without revealing the contents of any particular record. Yet, such protocols required the inclusion of multiple third parties, which may not always be feasible in the context of trust or bandwidth constraints. Thus, in this paper, we introduce a framework that removes the need for multiple third parties by collocating services to store and to process sensitive biomedical data through the integration of cryptographic hardware. Within this framework, we define a secure protocol to process genomic data and perform a series of experiments to demonstrate that such an approach can be run in an efficient manner for typical biomedical investigations.
生物医学界正日益转向依赖大量基因组和临床数据的研究工作。与此同时,各种法规要求此类数据在初始收集组织(如学术医疗中心)之外进行共享。至关重要的是,要确保在共享和管理此类数据时,以维护相应个人隐私和系统整体安全的方式进行。一般来说,各组织试图通过去识别方法来实现这些目标,这些方法会明确地、甚至潜在地去除识别特征(如姓名、日期和地理编码)。然而,越来越多的研究表明,使用简单的自动化方法可以将去识别数据重新识别为特定个人。作为一种替代方案,研究表明生物医学数据可以通过实用的加密协议进行共享、管理和分析,而无需透露任何特定记录的内容。然而,此类协议需要多个第三方参与,在信任或带宽受限的情况下,这可能并不总是可行的。因此,在本文中,我们引入了一个框架,通过集成加密硬件来配置存储和处理敏感生物医学数据的服务,从而消除了对多个第三方的需求。在此框架内,我们定义了一个处理基因组数据的安全协议,并进行了一系列实验,以证明这种方法可以以高效的方式用于典型的生物医学研究。