Naveed Muhammad, Ayday Erman, Clayton Ellen W, Fellay Jacques, Gunter Carl A, Hubaux Jean-Pierre, Malin Bradley A, Wang Xiaofeng
University of Illinois at Urbana-Champaign.
Ecole Polytechnique Federale de Lausanne.
ACM Comput Surv. 2015 Sep;48(1). doi: 10.1145/2767007.
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) an association with traits and certain diseases, identification capability (e.g., forensics), and revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.
基因组测序技术发展迅速,现在已经能够以低成本生成高度详细的基因型。收集和分析此类数据有可能支持包括个性化医疗服务在内的各种应用。虽然生物医学界大力宣扬基因组学革命的好处,但此类数据可用性的提高对个人隐私有重大影响;特别是因为基因组具有某些基本特征,包括(但不限于)与性状和某些疾病的关联、识别能力(例如法医鉴定)以及揭示家族关系。此外,直接面向消费者的DNA检测增加了基因组数据在监管较少的环境(如互联网和营利性公司)中可用的可能性。因此,基因组数据隐私问题处于计算机科学、医学和公共政策的交叉点。虽然计算机科学家已经解决了各种数据类型的数据隐私问题,但对基因组数据的关注较少。因此,本文的目标是为计算机科学界提供知识系统化。在此过程中,我们解决了该领域的一些(有时是错误的)观念,并报告了我们与生物医学专家进行的一项关于基因组数据隐私的调查。然后,在描述了基因组隐私问题之后,我们回顾了针对基因组数据的隐私攻击的最新情况以及减轻此类攻击的策略,并从医学和公共政策的角度对此类攻击进行了背景分析。本文最后列举了基因组数据隐私面临的挑战,并提出了一个框架,以便在该领域向前发展时,将威胁分析和对策设计系统化。