Malin Bradley
Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, Tennessee, USA.
AMIA Annu Symp Proc. 2006;2006:524-8.
Many genome-based research projects include familial relationships, such as pedigrees, with genomic data records. To protect anonymity when sharing family information, data holders remove, or encode, explicit identifiers (e.g. personal name). In this paper, however, we introduce IdentiFamily, a software program that can link de-identified family relations to named people. The program extracts genealogical knowledge from publicly available records and ascertains the re-identification risk for specific family relations. We find robust genealogies on current populations can be extracted from online sources, such as newspaper obituaries and death records. We evaluate IdentiFamily on real world data for a state's capital city and demonstrate unique identifiability for approximately 70% of the population. IdentiFamily provides organizations with a tool to evaluate the anonymity of pedigrees prior to disclosure and design formal privacy protection techniques.
许多基于基因组的研究项目都包含家族关系,如系谱,并带有基因组数据记录。为了在共享家族信息时保护匿名性,数据持有者会删除或编码明确的标识符(如个人姓名)。然而,在本文中,我们介绍了IdentiFamily,这是一个可以将去识别化的家族关系与特定姓名的人进行关联的软件程序。该程序从公开可用的记录中提取谱系知识,并确定特定家族关系的重新识别风险。我们发现,当前人群的可靠谱系可以从在线来源(如报纸讣告和死亡记录)中提取。我们在一个州首府城市的真实世界数据上对IdentiFamily进行了评估,并证明大约70%的人口具有独特的可识别性。IdentiFamily为组织提供了一种工具,用于在披露之前评估系谱的匿名性,并设计正式的隐私保护技术。