Zarate Oscar A, Brody Julia Green, Brown Phil, Ramirez-Andreotta Mónica D, Perovich Laura, Matz Jacob
Hastings Cent Rep. 2016 Jan-Feb;46(1):36-45. doi: 10.1002/hast.523. Epub 2015 Dec 17.
An individual's health, genetic, or environmental-exposure data, placed in an online repository, creates a valuable shared resource that can accelerate biomedical research and even open opportunities for crowd-sourcing discoveries by members of the public. But these data become "immortalized" in ways that may create lasting risk as well as benefit. Once shared on the Internet, the data are difficult or impossible to redact, and identities may be revealed by a process called data linkage, in which online data sets are matched to each other. Reidentification (re-ID), the process of associating an individual's name with data that were considered deidentified, poses risks such as insurance or employment discrimination, social stigma, and breach of the promises often made in informed-consent documents. At the same time, re-ID poses risks to researchers and indeed to the future of science, should re-ID end up undermining the trust and participation of potential research participants. The ethical challenges of online data sharing are heightened as so-called big data becomes an increasingly important research tool and driver of new research structures. Big data is shifting research to include large numbers of researchers and institutions as well as large numbers of participants providing diverse types of data, so the participants' consent relationship is no longer with a person or even a research institution. In addition, consent is further transformed because big data analysis often begins with descriptive inquiry and generation of a hypothesis, and the research questions cannot be clearly defined at the outset and may be unforeseeable over the long term. In this article, we consider how expanded data sharing poses new challenges, illustrated by genomics and the transition to new models of consent. We draw on the experiences of participants in an open data platform-the Personal Genome Project-to allow study participants to contribute their voices to inform ethical consent practices and protocol reviews for big-data research.
个人的健康、基因或环境暴露数据存储在在线数据库中,便创建了一种宝贵的共享资源,可加速生物医学研究,甚至为公众进行众包发现创造机会。但这些数据“不朽”的方式可能既带来持久的风险,也带来益处。一旦在互联网上共享,数据就很难甚至无法编辑,而且身份可能会通过一种称为数据关联的过程被泄露,即在线数据集相互匹配。重新识别(re-ID),即将个人姓名与被认为已去识别的数据相关联的过程,会带来诸如保险或就业歧视、社会污名以及违背知情同意文件中经常做出的承诺等风险。与此同时,重新识别对研究人员乃至科学的未来都构成风险,因为重新识别最终可能会破坏潜在研究参与者的信任和参与度。随着所谓的大数据成为越来越重要的研究工具和新研究结构的驱动力,在线数据共享的伦理挑战也日益凸显。大数据正在使研究范围扩大,纳入大量研究人员和机构以及提供各种类型数据的大量参与者,因此参与者的同意关系不再是与某个人甚至某个研究机构。此外,同意的性质也进一步发生了变化,因为大数据分析通常始于描述性探究和假设生成,研究问题在一开始无法明确界定,而且从长远来看可能不可预见。在本文中,我们将探讨数据共享范围的扩大如何带来新的挑战,以基因组学以及向新的同意模式的转变为例进行说明。我们借鉴了一个开放数据平台——个人基因组计划——参与者的经验,以使研究参与者能够发表意见,为大数据研究的伦理同意实践和方案审查提供参考。