Human Longevity, Inc., Mountain View, CA 94303;
Human Longevity, Inc., Mountain View, CA 94303.
Proc Natl Acad Sci U S A. 2017 Sep 19;114(38):10166-10171. doi: 10.1073/pnas.1711125114. Epub 2017 Sep 5.
Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.
从基因组数据预测人类的身体特征和人口统计学信息,这对个性化医疗中的隐私和数据去识别化构成了挑战。为了探索基于表型的基因组识别的现有能力,我们应用全基因组测序、详细的表型分析和统计建模,对来自不同祖先的 1061 名参与者队列进行了生物特征预测。单独来看,对于很大一部分特征,它们在遗传和人口统计学信息之外的预测准确性是有限的。然而,我们开发了一种最大熵算法,该算法可以整合多个预测结果,以确定哪些基因组样本和表型测量来自同一个人。使用该算法,我们在一个混合种族的队列中平均重新识别了 10 个保留个体中的 8 个以上,平均识别了 10 个非裔美国人或 10 个欧洲人中的 5 个。这项工作挑战了当前个人隐私的概念,可能会产生深远的伦理和法律影响。