Malin Bradley, Karp David, Scheuermann Richard H
Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37203, USA.
J Investig Med. 2010 Jan;58(1):11-8. doi: 10.2310/JIM.0b013e3181c9b2ea.
Clinical researchers need to share data to support scientific validation and information reuse and to comply with a host of regulations and directives from funders. Various organizations are constructing informatics resources in the form of centralized databases to ensure reuse of data derived from sponsored research. The widespread use of such open databases is contingent on the protection of patient privacy.
We review privacy-related problems associated with data sharing for clinical research from technical and policy perspectives. We investigate existing policies for secondary data sharing and privacy requirements in the context of data derived from research and clinical settings. In particular, we focus on policies specified by the US National Institutes of Health and the Health Insurance Portability and Accountability Act and touch on how these policies are related to current and future use of data stored in public database archives. We address aspects of data privacy and identifiability from a technical, although approachable, perspective and summarize how biomedical databanks can be exploited and seemingly anonymous records can be reidentified using various resources without hacking into secure computer systems.
We highlight which clinical and translational data features, specified in emerging research models, are potentially vulnerable or exploitable. In the process, we recount a recent privacy-related concern associated with the publication of aggregate statistics from pooled genome-wide association studies that have had a significant impact on the data sharing policies of National Institutes of Health-sponsored databanks.
Based on our analysis and observations we provide a list of recommendations that cover various technical, legal, and policy mechanisms that open clinical databases can adopt to strengthen data privacy protection as they move toward wider deployment and adoption.
临床研究人员需要共享数据,以支持科学验证和信息再利用,并遵守资助者的一系列法规和指令。各种组织正在以集中式数据库的形式构建信息学资源,以确保对资助研究产生的数据进行再利用。此类开放数据库的广泛使用取决于对患者隐私的保护。
我们从技术和政策角度审查与临床研究数据共享相关的隐私问题。我们研究了二次数据共享的现有政策以及研究和临床环境中数据的隐私要求。特别是,我们重点关注美国国立卫生研究院和《健康保险流通与责任法案》规定的政策,并探讨这些政策与公共数据库存档中存储的数据当前和未来使用的关系。我们从技术(尽管易于理解)的角度探讨数据隐私和可识别性方面的问题,并总结如何利用生物医学数据库以及如何在不侵入安全计算机系统的情况下使用各种资源重新识别看似匿名的记录。
我们强调了新兴研究模型中规定的哪些临床和转化数据特征可能易受攻击或被利用。在此过程中,我们讲述了最近与汇总全基因组关联研究的汇总统计数据发布相关的隐私问题,这些问题对国立卫生研究院资助的数据库的数据共享政策产生了重大影响。
基于我们的分析和观察,我们提供了一系列建议,涵盖开放临床数据库在迈向更广泛的部署和采用过程中可采用的各种技术、法律和政策机制,以加强数据隐私保护。