• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大海捞针:识别混合基因组数据中的个体

Needles in the haystack: identifying individuals present in pooled genomic data.

作者信息

Braun Rosemary, Rowe William, Schaefer Carl, Zhang Jinghui, Buetow Kenneth

机构信息

Laboratory of Population Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America.

出版信息

PLoS Genet. 2009 Oct;5(10):e1000668. doi: 10.1371/journal.pgen.1000668. Epub 2009 Oct 2.

DOI:10.1371/journal.pgen.1000668
PMID:19798441
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2747273/
Abstract

Recent publications have described and applied a novel metric that quantifies the genetic distance of an individual with respect to two population samples, and have suggested that the metric makes it possible to infer the presence of an individual of known genotype in a sample for which only the marginal allele frequencies are known. However, the assumptions, limitations, and utility of this metric remained incompletely characterized. Here we present empirical tests of the method using publicly accessible genotypes, as well as analytical investigations of the method's strengths and limitations. The results reveal that the null distribution is sensitive to the underlying assumptions, making it difficult to accurately calibrate thresholds for classifying an individual as a member of the population samples. As a result, the false-positive rates obtained in practice are considerably higher than previously believed. However, despite the metric's inadequacies for identifying the presence of an individual in a sample, our results suggest potential avenues for future research on tuning this method to problems of ancestry inference or disease prediction. By revealing both the strengths and limitations of the proposed method, we hope to elucidate situations in which this distance metric may be used in an appropriate manner. We also discuss the implications of our findings in forensics applications and in the protection of GWAS participant privacy.

摘要

近期的出版物描述并应用了一种新的度量标准,该标准可量化个体相对于两个群体样本的遗传距离,并表明该度量标准能够在仅知道边际等位基因频率的样本中推断出已知基因型个体的存在。然而,该度量标准的假设、局限性和实用性仍未得到充分描述。在此,我们使用公开可用的基因型对该方法进行实证检验,并对该方法的优势和局限性进行分析研究。结果表明,零分布对潜在假设敏感,这使得难以准确校准将个体分类为群体样本成员的阈值。因此,实际获得的假阳性率远高于先前的认知。然而,尽管该度量标准在识别样本中个体的存在方面存在不足,但我们的结果为未来将该方法调整用于血统推断或疾病预测问题的研究提供了潜在途径。通过揭示所提出方法的优势和局限性,我们希望阐明可以适当使用这种距离度量标准的情况。我们还讨论了我们的研究结果在法医学应用以及保护全基因组关联研究(GWAS)参与者隐私方面的意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe1/2747273/399fb8f1dfda/pgen.1000668.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe1/2747273/3c9a6bd08d13/pgen.1000668.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe1/2747273/c8317b7c7c3a/pgen.1000668.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe1/2747273/399fb8f1dfda/pgen.1000668.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe1/2747273/3c9a6bd08d13/pgen.1000668.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe1/2747273/c8317b7c7c3a/pgen.1000668.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dbe1/2747273/399fb8f1dfda/pgen.1000668.g003.jpg

相似文献

1
Needles in the haystack: identifying individuals present in pooled genomic data.大海捞针:识别混合基因组数据中的个体
PLoS Genet. 2009 Oct;5(10):e1000668. doi: 10.1371/journal.pgen.1000668. Epub 2009 Oct 2.
2
Inference of kinship using spatial distributions of SNPs for genome-wide association studies.利用单核苷酸多态性(SNP)的空间分布进行全基因组关联研究的亲缘关系推断。
BMC Genomics. 2016 May 20;17:372. doi: 10.1186/s12864-016-2696-0.
3
Inference Attacks and Controls on Genotypes and Phenotypes for Individual Genomic Data.个体基因组数据的基因型和表型的推理攻击与控制。
IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):930-937. doi: 10.1109/TCBB.2018.2810180. Epub 2018 Feb 27.
4
Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.利用多个群体的等位基因频率从DNA序列数据中快速推断个体祖先。
BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.
5
Empirical and deterministic accuracies of across-population genomic prediction.跨群体基因组预测的经验性和确定性准确性。
Genet Sel Evol. 2015 Feb 6;47(1):5. doi: 10.1186/s12711-014-0086-0.
6
Reliability of genomic prediction for milk fatty acid composition by using a multi-population reference and incorporating GWAS results.利用多群体参考和整合 GWAS 结果进行牛奶脂肪酸成分的基因组预测的可靠性。
Genet Sel Evol. 2019 Apr 27;51(1):16. doi: 10.1186/s12711-019-0460-z.
7
A comprehensive evaluation of SNP genotype imputation.单核苷酸多态性(SNP)基因型填充的综合评估。
Hum Genet. 2009 Mar;125(2):163-71. doi: 10.1007/s00439-008-0606-5. Epub 2008 Dec 17.
8
Genomic Prediction Using Individual-Level Data and Summary Statistics from Multiple Populations.基于个体水平数据和多人群汇总统计信息的基因组预测。
Genetics. 2018 Sep;210(1):53-69. doi: 10.1534/genetics.118.301109. Epub 2018 Jul 18.
9
Controlling false discoveries in genome scans for selection.在基因组扫描中控制选择的错误发现。
Mol Ecol. 2016 Jan;25(2):454-69. doi: 10.1111/mec.13513. Epub 2016 Jan 18.
10
Aggregation of experts: an application in the field of "interactomics" (detection of interactions on the basis of genomic data).专家聚合:在“相互作用组学”领域的应用(基于基因组数据检测相互作用)。
BMC Bioinformatics. 2018 Nov 21;19(1):445. doi: 10.1186/s12859-018-2447-0.

引用本文的文献

1
Anonymization: The imperfect science of using data while preserving privacy.匿名化:在保护隐私的同时使用数据的不完美科学。
Sci Adv. 2024 Jul 19;10(29):eadn7053. doi: 10.1126/sciadv.adn7053. Epub 2024 Jul 17.
2
Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review.评估基因数据集的隐私漏洞:范围综述
JMIR Bioinform Biotechnol. 2024 May 27;5:e54332. doi: 10.2196/54332.
3
Sociotechnical safeguards for genomic data privacy.基因组数据隐私的社会技术保障措施。

本文引用的文献

1
Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays.使用高密度单核苷酸多态性(SNP)基因分型微阵列解析对高度复杂混合物贡献微量DNA的个体。
PLoS Genet. 2008 Aug 29;4(8):e1000167. doi: 10.1371/journal.pgen.1000167.
2
A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer.一项全基因组关联研究确定了FGFR2基因中的等位基因与散发性绝经后乳腺癌风险相关。
Nat Genet. 2007 Jul;39(7):870-4. doi: 10.1038/ng2075. Epub 2007 May 27.
3
The International HapMap Project.
Nat Rev Genet. 2022 Jul;23(7):429-445. doi: 10.1038/s41576-022-00455-y. Epub 2022 Mar 4.
4
Human microbiome privacy risks associated with summary statistics.人类微生物组隐私风险与汇总统计数据相关。
PLoS One. 2021 Apr 2;16(4):e0249528. doi: 10.1371/journal.pone.0249528. eCollection 2021.
5
Proof-of-concept study: Homomorphically encrypted data can support real-time learning in personalized cancer medicine.概念验证研究:同态加密数据可支持个性化癌症医学中的实时学习。
BMC Med Inform Decis Mak. 2019 Dec 4;19(1):255. doi: 10.1186/s12911-019-0983-9.
6
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.一种用于将短读段映射到人类基因组的安全比对算法。
J Comput Biol. 2018 Jun;25(6):529-540. doi: 10.1089/cmb.2017.0094. Epub 2018 May 9.
7
Big Data in Public Health: Terminology, Machine Learning, and Privacy.大数据在公共卫生中的应用:术语、机器学习和隐私
Annu Rev Public Health. 2018 Apr 1;39:95-112. doi: 10.1146/annurev-publhealth-040617-014208. Epub 2017 Dec 20.
8
One Size Doesn't Fit All: Measuring Individual Privacy in Aggregate Genomic Data.一刀切并不适用:衡量汇总基因组数据中的个人隐私
Proc IEEE Symp Secur Priv Workshops. 2015;2015:41-49. doi: 10.1109/SPW.2015.25. Epub 2015 Jul 20.
9
On the privacy risks of sharing clinical proteomics data.论共享临床蛋白质组学数据的隐私风险
AMIA Jt Summits Transl Sci Proc. 2016 Aug 31;2016:122-31. eCollection 2016.
10
Privacy in the Genomic Era.基因组时代的隐私问题。
ACM Comput Surv. 2015 Sep;48(1). doi: 10.1145/2767007.
国际人类基因组单体型图计划
Nature. 2003 Dec 18;426(6968):789-96. doi: 10.1038/nature02168.