具有群体结构的全基因组 IBD 估计中的误差特征化和校正。

Characterization and correction of error in genome-wide IBD estimation for samples with population structure.

机构信息

Department of Biostatistics, University of Washington, Seattle, Washington 98195-7232, USA.

出版信息

Genet Epidemiol. 2013 Sep;37(6):635-41. doi: 10.1002/gepi.21737. Epub 2013 Jun 5.

DOI:10.1002/gepi.21737

PMID:23740691

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4001853/

Abstract

The proportion of the genome that is shared identical by descent (IBD) between pairs of individuals is often estimated in studies involving genome-wide SNP data. These estimates can be used to check pedigrees, estimate heritability, and adjust association analyses. We focus on the method of moments technique as implemented in PLINK [Purcell et al., 2007] and other software that estimates the proportions of the genome at which two individuals share 0, 1, or 2 alleles IBD. This technique is based on the assumption that the study sample is drawn from a single, homogeneous, randomly mating population. This assumption is violated if pedigree founders are drawn from multiple populations or include admixed individuals. In the presence of population structure, the method of moments estimator has an inflated variance and can be biased because it relies on sample-based allele frequency estimates. In the case of the PLINK estimator, which truncates genome-wide sharing estimates at zero and one to generate biologically interpretable results, the bias is most often towards over-estimation of relatedness between ancestrally similar individuals. Using simulated pedigrees, we are able to demonstrate and quantify the behavior of the PLINK method of moments estimator under different population structure conditions. We also propose a simple method based on SNP pruning for improving genome-wide IBD estimates when the assumption of a single, homogeneous population is violated.

摘要

在涉及全基因组 SNP 数据的研究中，通常会估计个体间共享完全相同的遗传（IBD）的基因组比例。这些估计可用于检查系谱、估计遗传率和调整关联分析。我们专注于 PLINK [Purcell 等人，2007] 中实现的矩法技术和其他软件，这些软件估计两个人共享 0、1 或 2 个等位基因 IBD 的基因组比例。该技术基于研究样本取自单一、同质、随机交配群体的假设。如果系谱创始人来自多个群体或包含混合个体，则违反了该假设。在存在群体结构的情况下，矩法估计量的方差会膨胀，并且可能存在偏差，因为它依赖于基于样本的等位基因频率估计。对于 PLINK 估计量，它会截断全基因组共享估计值为零和一，以生成具有生物学意义的结果，因此最常见的偏差是高估具有相似祖先的个体之间的亲缘关系。使用模拟系谱，我们能够在不同的群体结构条件下展示和量化 PLINK 矩法估计量的行为。我们还提出了一种基于 SNP 修剪的简单方法，用于在违反单一、同质群体假设的情况下改进全基因组 IBD 估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1696/4001853/39eafc8e20df/nihms-563805-f0001.jpg

相似文献

Characterization and correction of error in genome-wide IBD estimation for samples with population structure.具有群体结构的全基因组 IBD 估计中的误差特征化和校正。

Genet Epidemiol. 2013 Sep;37(6):635-41. doi: 10.1002/gepi.21737. Epub 2013 Jun 5.

Inference of relationships in population data using identity-by-descent and identity-by-state.利用血缘关系和基因状态推断群体数据中的关系。

PLoS Genet. 2011 Sep;7(9):e1002287. doi: 10.1371/journal.pgen.1002287. Epub 2011 Sep 22.

Using identity by descent estimation with dense genotype data to detect positive selection.利用高密度基因型数据的血统估计来检测正选择。

Eur J Hum Genet. 2013 Feb;21(2):205-11. doi: 10.1038/ejhg.2012.148. Epub 2012 Jul 11.

Estimating kinship in admixed populations.估算混合人群中的亲属关系。

Am J Hum Genet. 2012 Jul 13;91(1):122-38. doi: 10.1016/j.ajhg.2012.05.024. Epub 2012 Jun 28.

Estimating Genetic Relatedness in Admixed Populations.估计混合人群中的遗传相关性。

G3 (Bethesda). 2018 Oct 3;8(10):3203-3220. doi: 10.1534/g3.118.200485.

A fast, powerful method for detecting identity by descent.一种快速、强大的通过血缘关系进行身份检测的方法。

Am J Hum Genet. 2011 Feb 11;88(2):173-82. doi: 10.1016/j.ajhg.2011.01.010.

ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure.路途中的病例对照关联测试：具有部分或完全未知的群体和家系结构。

Am J Hum Genet. 2010 Feb 12;86(2):172-84. doi: 10.1016/j.ajhg.2010.01.001. Epub 2010 Feb 4.

A Unified Characterization of Population Structure and Relatedness.群体结构与亲缘关系的统一表征

Genetics. 2017 Aug;206(4):2085-2103. doi: 10.1534/genetics.116.198424. Epub 2017 May 26.

Fast and Accurate Shared Segment Detection and Relatedness Estimation in Un-phased Genetic Data via TRUFFLE.通过 TRUFFLE 在非相位遗传数据中快速准确地检测共享片段和估计亲缘关系。

Am J Hum Genet. 2019 Jul 3;105(1):78-88. doi: 10.1016/j.ajhg.2019.05.007. Epub 2019 Jun 6.

Investigating pedigree- and SNP-associated components of heritability in a wild population of Soay sheep.调查野生斯羔绵羊中家系和 SNP 相关遗传成分。

Heredity (Edinb). 2024 Apr;132(4):202-210. doi: 10.1038/s41437-024-00673-6. Epub 2024 Feb 10.

引用本文的文献

Systematic bias in malaria parasite relatedness estimation.疟原虫亲缘关系估计中的系统偏差。

G3 (Bethesda). 2025 May 8;15(5). doi: 10.1093/g3journal/jkaf018.

Genetic variants in canonical Wnt signaling pathway associated with pediatric immune thrombocytopenia.经典 Wnt 信号通路中的遗传变异与儿童免疫性血小板减少症相关。

Blood Adv. 2024 Nov 12;8(21):5529-5538. doi: 10.1182/bloodadvances.2024012776.

A machine learning approach for missing persons cases with high genotyping errors.一种用于处理具有高基因分型错误的失踪人员案件的机器学习方法。

Front Genet. 2022 Oct 3;13:971242. doi: 10.3389/fgene.2022.971242. eCollection 2022.

PATRIOT: A Pipeline for Tracing Identity-by-Descent for Chromosome Segments to Improve Genomic Prediction in Self-Pollinating Crop Species.PATRIOT：一种用于追踪自花授粉作物物种染色体片段的同源身份以改进基因组预测的流程。

Front Plant Sci. 2021 Sep 29;12:676269. doi: 10.3389/fpls.2021.676269. eCollection 2021.

DNA-based genealogy reconstruction of Nebbiolo, Barbera and other ancient grapevine cultivars from northwestern Italy.基于 DNA 的意大利西北部内比奥罗、巴贝拉和其他古老葡萄品种的系统发育重建。

Sci Rep. 2020 Sep 25;10(1):15782. doi: 10.1038/s41598-020-72799-6.

Quickly identifying identical and closely related subjects in large databases using genotype data.利用基因型数据在大型数据库中快速识别相同和密切相关的个体。

PLoS One. 2017 Jun 13;12(6):e0179106. doi: 10.1371/journal.pone.0179106. eCollection 2017.

Estimating relationships between phenotypes and subjects drawn from admixed families.估计混合家庭中表型与个体之间的关系。

BMC Proc. 2016 Oct 18;10(Suppl 7):357-362. doi: 10.1186/s12919-016-0056-3. eCollection 2016.

Pleiotropic Mechanisms Indicated for Sex Differences in Autism.自闭症性别差异的多效性机制

PLoS Genet. 2016 Nov 15;12(11):e1006425. doi: 10.1371/journal.pgen.1006425. eCollection 2016 Nov.

PADRE: Pedigree-Aware Distant-Relationship Estimation.PADRE：系谱感知远距离关系估计。

Am J Hum Genet. 2016 Jul 7;99(1):154-62. doi: 10.1016/j.ajhg.2016.05.020. Epub 2016 Jun 30.

Model-free Estimation of Recent Genetic Relatedness.近期遗传相关性的无模型估计

Am J Hum Genet. 2016 Jan 7;98(1):127-48. doi: 10.1016/j.ajhg.2015.11.022.

本文引用的文献

Estimating kinship in admixed populations.估算混合人群中的亲属关系。

Am J Hum Genet. 2012 Jul 13;91(1):122-38. doi: 10.1016/j.ajhg.2012.05.024. Epub 2012 Jun 28.

Estimating missing heritability for disease from genome-wide association studies.从全基因组关联研究估计疾病的遗传缺失率。

Am J Hum Genet. 2011 Mar 11;88(3):294-305. doi: 10.1016/j.ajhg.2011.02.002. Epub 2011 Mar 3.

Robust relationship inference in genome-wide association studies.全基因组关联研究中的稳健关系推断。

Bioinformatics. 2010 Nov 15;26(22):2867-73. doi: 10.1093/bioinformatics/btq559. Epub 2010 Oct 5.

Quality control and quality assurance in genotypic data for genome-wide association studies.全基因组关联研究中基因型数据的质量控制和质量保证。

Genet Epidemiol. 2010 Sep;34(6):591-602. doi: 10.1002/gepi.20516.

Am J Hum Genet. 2010 Feb 12;86(2):172-84. doi: 10.1016/j.ajhg.2010.01.001. Epub 2010 Feb 4.

Sensitive detection of chromosomal segments of distinct ancestry in admixed populations.在混合群体中灵敏检测不同祖先的染色体片段。

PLoS Genet. 2009 Jun;5(6):e1000519. doi: 10.1371/journal.pgen.1000519. Epub 2009 Jun 19.

A second generation human haplotype map of over 3.1 million SNPs.一张包含超过310万个单核苷酸多态性的第二代人类单倍型图谱。

Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.

PLINK: a tool set for whole-genome association and population-based linkage analyses.PLINK：一个用于全基因组关联分析和基于群体的连锁分析的工具集。

Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.

Mapping by admixture linkage disequilibrium in human populations: limits and guidelines.人类群体中通过混合连锁不平衡进行的定位：局限性与指南

Am J Hum Genet. 1994 Oct;55(4):809-24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验