利用图论选择最大的一组无关联个体进行遗传分析。

Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis.

机构信息

Department of Genome Sciences, The University of Washington, Seattle, WA 98195, USA.

出版信息

Genet Epidemiol. 2013 Feb;37(2):136-41. doi: 10.1002/gepi.21684. Epub 2012 Sep 19.

DOI:10.1002/gepi.21684

PMID:22996348

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3770842/

Abstract

Many statistical analyses of genetic data rely on the assumption of independence among samples. Consequently, relatedness is either modeled in the analysis or samples are removed to "clean" the data of any pairwise relatedness above a tolerated threshold. Current methods do not maximize the number of unrelated individuals retained for further analysis, and this is a needless loss of resources. We report a novel application of graph theory that identifies the maximum set of unrelated samples in any dataset given a user-defined threshold of relatedness as well as all networks of related samples. We have implemented this method into an open source program called Pedigree Reconstruction and Identification of a Maximum Unrelated Set, PRIMUS. We show that PRIMUS outperforms the three existing methods, allowing researchers to retain up to 50% more unrelated samples. A unique strength of PRIMUS is its ability to weight the maximum clique selection using additional criteria (e.g. affected status and data missingness). PRIMUS is a permanent solution to identifying the maximum number of unrelated samples for a genetic analysis.

摘要

许多遗传数据分析都依赖于样本之间独立性的假设。因此，要么在分析中对亲缘关系进行建模，要么将样本移除以“清理”数据中超过可容忍阈值的任何成对亲缘关系。当前的方法并没有最大化保留用于进一步分析的无关个体的数量，这是一种不必要的资源浪费。我们报告了图论的一种新应用，该应用可以在给定用户定义的亲缘关系阈值以及所有相关样本网络的情况下，确定任何数据集的最大无关样本集。我们已经将这种方法实现到一个名为“Pedigree Reconstruction and Identification of a Maximum Unrelated Set”（PRIMUS）的开源程序中。我们表明，PRIMUS 优于现有的三种方法，允许研究人员保留多达 50%的更多无关样本。PRIMUS 的一个独特优势是，它能够使用其他标准（例如受影响状态和数据缺失）对最大团选择进行加权。PRIMUS 是确定遗传分析中最大数量无关样本的永久解决方案。

相似文献

Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis.利用图论选择最大的一组无关联个体进行遗传分析。

Genet Epidemiol. 2013 Feb;37(2):136-41. doi: 10.1002/gepi.21684. Epub 2012 Sep 19.

PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent.PRIMUS：通过全基因组的同源性估计快速重建家系。

Am J Hum Genet. 2014 Nov 6;95(5):553-64. doi: 10.1016/j.ajhg.2014.10.005. Epub 2014 Oct 30.

PADRE: Pedigree-Aware Distant-Relationship Estimation.PADRE：系谱感知远距离关系估计。

Am J Hum Genet. 2016 Jul 7;99(1):154-62. doi: 10.1016/j.ajhg.2016.05.020. Epub 2016 Jun 30.

An impatient evolutionary algorithm with probabilistic tabu search for unified solution of some NP-hard problems in graph and set theory via clique finding.一种带有概率禁忌搜索的不耐烦进化算法，用于通过团发现对图论和集合论中的一些NP难问题进行统一求解。

IEEE Trans Syst Man Cybern B Cybern. 2008 Jun;38(3):645-66. doi: 10.1109/TSMCB.2008.915645.

Friends and family: A software program for identification of unrelated individuals from molecular marker data.朋友和家人：一种用于从分子标记数据中识别无关个体的软件程序。

Mol Ecol Resour. 2017 Nov;17(6):e225-e233. doi: 10.1111/1755-0998.12691. Epub 2017 Jun 13.

Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.在存在亲缘关系的情况下，对群体结构进行稳健推断，以进行血统预测和分层校正。

Genet Epidemiol. 2015 May;39(4):276-93. doi: 10.1002/gepi.21896. Epub 2015 Mar 23.

Inference of relationships in population data using identity-by-descent and identity-by-state.利用血缘关系和基因状态推断群体数据中的关系。

PLoS Genet. 2011 Sep;7(9):e1002287. doi: 10.1371/journal.pgen.1002287. Epub 2011 Sep 22.

PRIMUS: improving pedigree reconstruction using mitochondrial and Y haplotypes.PRIMUS：利用线粒体和Y单倍型改进系谱重建

Bioinformatics. 2016 Feb 15;32(4):596-8. doi: 10.1093/bioinformatics/btv618. Epub 2015 Oct 29.

Identifying cryptic relationships.识别隐秘关系。

Methods Mol Biol. 2012;850:47-57. doi: 10.1007/978-1-61779-555-8_4.

Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies.稀疏主成分分析在全基因组关联研究中识别与祖先相关的标记。

Genet Epidemiol. 2012 May;36(4):293-302. doi: 10.1002/gepi.21621. Epub 2012 Apr 16.

引用本文的文献

Frequency enrichment of coding variants in a French-Canadian founder population and its implication for inflammatory bowel diseases.法裔加拿大奠基者人群中编码变异的频率富集及其对炎症性肠病的影响。

medRxiv. 2025 Jul 14:2025.07.11.25331388. doi: 10.1101/2025.07.11.25331388.

Multiomics reveal key inflammatory drivers of severe obesity: IL4R, LILRA5, and OSM.多组学揭示严重肥胖的关键炎症驱动因素：白细胞介素4受体、白细胞免疫球蛋白样受体A5和抑瘤素M。

Cell Genom. 2025 Mar 12;5(3):100784. doi: 10.1016/j.xgen.2025.100784. Epub 2025 Mar 4.

Detection of distant relatedness in biobanks to identify undiagnosed cases of Mendelian disease as applied to Long QT syndrome.生物银行中远距离亲缘关系的检测，以鉴定孟德尔疾病的未确诊病例，如长 QT 综合征。

Nat Commun. 2024 Aug 29;15(1):7507. doi: 10.1038/s41467-024-51977-4.

Integrating Genetic and Transcriptomic Data to Identify Genes Underlying Obesity Risk Loci.整合遗传和转录组数据以鉴定肥胖风险位点背后的基因。

medRxiv. 2024 Jun 12:2024.06.11.24308730. doi: 10.1101/2024.06.11.24308730.

Polymorphic short tandem repeats make widespread contributions to blood and serum traits.多态短串联重复序列对血液和血清特征有广泛的贡献。

Cell Genom. 2023 Dec 13;3(12):100458. doi: 10.1016/j.xgen.2023.100458.

Exome-wide assessment of isolated biliary atresia: A report from the National Birth Defects Prevention Study using child-parent trios and a case-control design to identify novel rare variants.孤立性胆道闭锁的外显子组评估：国家出生缺陷预防研究使用亲子三核苷酸和病例对照设计来识别新的罕见变异的报告

Am J Med Genet A. 2023 Jun;191(6):1546-1556. doi: 10.1002/ajmg.a.63185. Epub 2023 Mar 21.

The female protective effect against autism spectrum disorder.女性对自闭症谱系障碍的保护作用。

Cell Genom. 2022 Jun 8;2(6):100134. doi: 10.1016/j.xgen.2022.100134.

Genomic Assessment of Cancer Susceptibility in the Threatened Catalina Island Fox ().对受威胁的卡塔利娜岛狐（）癌症易感性的基因组评估。

Genes (Basel). 2022 Aug 22;13(8):1496. doi: 10.3390/genes13081496.

A loss-of-function IFNAR1 allele in Polynesia underlies severe viral diseases in homozygotes.在波利尼西亚，IFNAR1 失活等位基因导致纯合子中严重的病毒病。

J Exp Med. 2022 Jun 6;219(6). doi: 10.1084/jem.20220028. Epub 2022 Apr 20.

Evaluating the utility of identity-by-descent segment numbers for relatedness inference via information theory and classification.利用信息论和分类学评估基于同源性的身份段数量在相关性推断中的效用。

G3 (Bethesda). 2022 May 30;12(6). doi: 10.1093/g3journal/jkac072.

本文引用的文献

Population structure of Hispanics in the United States: the multi-ethnic study of atherosclerosis.美国西班牙裔人群的人口结构：动脉粥样硬化的多种族研究。

PLoS Genet. 2012;8(4):e1002640. doi: 10.1371/journal.pgen.1002640. Epub 2012 Apr 12.

Identity by descent estimation with dense genome-wide genotype data.基于全基因组高密度基因型数据的亲缘关系估计。

Genet Epidemiol. 2011 Sep;35(6):557-67. doi: 10.1002/gepi.20606. Epub 2011 Jul 18.

Family-based designs for genome-wide association studies.基于家系的全基因组关联研究设计。

Nat Rev Genet. 2011 Jun 1;12(7):465-74. doi: 10.1038/nrg2989.

Maximum-likelihood estimation of recent shared ancestry (ERSA).最大似然估计近期共享祖先（ERSA）。

Genome Res. 2011 May;21(5):768-74. doi: 10.1101/gr.115972.110. Epub 2011 Feb 8.

A fast, powerful method for detecting identity by descent.一种快速、强大的通过血缘关系进行身份检测的方法。

Am J Hum Genet. 2011 Feb 11;88(2):173-82. doi: 10.1016/j.ajhg.2011.01.010.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Robust relationship inference in genome-wide association studies.全基因组关联研究中的稳健关系推断。

Bioinformatics. 2010 Nov 15;26(22):2867-73. doi: 10.1093/bioinformatics/btq559. Epub 2010 Oct 5.

Inference of unexpected genetic relatedness among individuals in HapMap Phase III.推断 HapMap 第三阶段个体之间意外的遗传关联性。

Am J Hum Genet. 2010 Oct 8;87(4):457-64. doi: 10.1016/j.ajhg.2010.08.014.

Integrating common and rare genetic variation in diverse human populations.整合不同人类群体中的常见和罕见遗传变异。

Nature. 2010 Sep 2;467(7311):52-8. doi: 10.1038/nature09298.

High-resolution detection of identity by descent in unrelated individuals.高分辨率检测无关个体间的血缘关系。

Am J Hum Genet. 2010 Apr 9;86(4):526-39. doi: 10.1016/j.ajhg.2010.02.021. Epub 2010 Mar 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验