Edge Michael D, Coop Graham
Center for Population Biology, University of California, Davis, Davis, United States.
Department of Evolution and Ecology, University of California, Davis, Davis, United States.
Elife. 2020 Jan 7;9:e51810. doi: 10.7554/eLife.51810.
Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.
直接面向消费者(DTC)的基因检测服务越来越受欢迎,拥有数千万客户。有几家DTC系谱服务公司允许用户上传基因数据以寻找亲属,这些亲属被识别为拥有共享状态相同(IBS)区域基因组的人。在此,我们描述了对手通过上传多个数据集来了解数据库基因型的方法。例如,上传大约900个基因组的对手可以在欧洲血统的中位数人群高达82%的基因组中的单核苷酸多态性(SNP)位点恢复至少一个等位基因。在使用未分型基因型检测IBS片段的数据库中,大约100次伪造上传就能揭示足够的基因信息,以实现全基因组基因填充。我们在GEDmatch数据库中提供了一个概念验证演示,并提出了防止我们所描述的利用行为的对策。