归因法产生的基因型差异。

Genotypic discrepancies arising from imputation.

作者信息

Hinrichs Anthony L, Culverhouse Robert C, Suarez Brian K

机构信息

Department of Psychiatry, Washington University School of Medicine, St. Louis, MO 63110, USA.

Department of Medicine and Division of Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, USA.

出版信息

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S17. doi: 10.1186/1753-6561-8-S1-S17. eCollection 2014.

DOI:10.1186/1753-6561-8-S1-S17

PMID:25519370

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4143754/

Abstract

The ideal genetic analysis of family data would include whole genome sequence on all family members. A strategy of combining sequence data from a subset of key individuals with inexpensive, genome-wide association study (GWAS) chip genotypes on all individuals to infer sequence level genotypes throughout the families has been suggested as a highly accurate alternative. This strategy was followed by the Genetic Analysis Workshop 18 data providers. We examined the quality of the imputation to identify potential consequences of this strategy by comparing discrepancies between GWAS genotype calls and imputed calls for the same variants. Overall, the inference and imputation process worked very well. However, we find that discrepancies occurred at an increased rate when imputation was used to infer missing data in sequenced individuals. Although this may be an artifact of this particular instantiation of these analytic methods, there may be general genetic or algorithmic reasons to avoid trying to fill in missing sequence data. This is especially true given the risk of false positives and reduction in power for family-based transmission tests when founders are incorrectly imputed as heterozygotes. Finally, we note a higher rate of discrepancies when unsequenced individuals are inferred using sequenced individuals from other pedigrees drawn from the same admixed population.

摘要

对家系数据进行理想的基因分析应包括所有家庭成员的全基因组序列。有人提出了一种策略，即将来自关键个体子集的序列数据与所有个体的廉价全基因组关联研究（GWAS）芯片基因型相结合，以推断整个家系的序列水平基因型，这是一种高度准确的替代方法。遗传分析研讨会18的数据提供者采用了这种策略。我们通过比较相同变异的GWAS基因型调用与推断基因型调用之间的差异，检查了推断的质量，以确定该策略的潜在后果。总体而言，推断和归因过程运行得非常好。然而，我们发现，当使用归因来推断测序个体中的缺失数据时，差异出现的频率增加。尽管这可能是这些分析方法的这种特定实例化的人为产物，但可能存在一些普遍的遗传或算法原因，以避免尝试填充缺失的序列数据。考虑到当奠基者被错误地推断为杂合子时，基于家系的传递检验出现假阳性的风险和效能降低，情况尤其如此。最后，我们注意到，当使用来自同一混合人群的其他家系的测序个体来推断未测序个体时，差异发生率更高。

相似文献

Genotypic discrepancies arising from imputation.归因法产生的基因型差异。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S17. doi: 10.1186/1753-6561-8-S1-S17. eCollection 2014.

Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle.评估插补序列变异基因型的准确性及其在牛因果变异检测中的效用。

Genet Sel Evol. 2017 Feb 21;49(1):24. doi: 10.1186/s12711-017-0301-x.

Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.孟德尔分离定律在家族中的价值：数据质量控制、插补及其他。

Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S21-8. doi: 10.1002/gepi.21821.

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。

BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation.利用简化基因组测序（GBS）和填充技术对圈养非人灵长类动物进行全基因组特征分析。

BMC Genomics. 2016 Aug 24;17(1):676. doi: 10.1186/s12864-016-2966-x.

Genotype Imputation from Large Reference Panels.基于大型参考面板的基因型推断。

Annu Rev Genomics Hum Genet. 2018 Aug 31;19:73-96. doi: 10.1146/annurev-genom-083117-021602. Epub 2018 May 23.

Accuracy of genotype imputation in sheep breeds.绵羊品种基因型推断的准确性。

Anim Genet. 2012 Feb;43(1):72-80. doi: 10.1111/j.1365-2052.2011.02208.x. Epub 2011 May 27.

Genotype imputation accuracy with different reference panels in admixed populations.混合人群中不同参考面板的基因型填充准确性。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S64. doi: 10.1186/1753-6561-8-S1-S64. eCollection 2014.

GWAS on Imputed Whole-Genome Resequencing From Genotyping-by-Sequencing Data for Farrowing Interval of Different Parities in Pigs.基于测序分型数据进行猪不同胎次产仔间隔的全基因组重测序推算的全基因组关联研究

Front Genet. 2019 Oct 18;10:1012. doi: 10.3389/fgene.2019.01012. eCollection 2019.

Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies.使用多个猪群体进行全基因组序列推断及其在全基因组关联研究中的应用。

Genet Sel Evol. 2019 Jan 24;51(1):2. doi: 10.1186/s12711-019-0445-y.

引用本文的文献

Integrative Harmonization of Phenotypic and Genomic Data Improves Bone Mineral Density Prediction in Multi-Study Osteoporosis Research.表型和基因组数据的综合协调改善了多研究骨质疏松症研究中的骨密度预测。

medRxiv. 2025 May 13:2025.05.12.25327471. doi: 10.1101/2025.05.12.25327471.

Prioritization of family member sequencing for the detection of rare variants.为检测罕见变异对家庭成员测序进行优先级排序。

BMC Proc. 2016 Oct 18;10(Suppl 7):227-231. doi: 10.1186/s12919-016-0035-8. eCollection 2016.

Family-based approaches: design, imputation, analysis, and beyond.基于家庭的方法：设计、插补、分析及其他。

BMC Genet. 2016 Feb 3;17 Suppl 2(Suppl 2):9. doi: 10.1186/s12863-015-0318-5.

Identifying cryptic population structure in multigenerational pedigrees in a Mexican American sample.在一个墨西哥裔美国人样本中识别多代系谱中的隐性群体结构。

BMC Proc. 2014 Jun 17;8(Suppl 1):S4. doi: 10.1186/1753-6561-8-S1-S4. eCollection 2014.

Data for Genetic Analysis Workshop 18: human whole genome sequence, blood pressure, and simulated phenotypes in extended pedigrees.遗传分析研讨会18的数据：人类全基因组序列、血压以及扩展家系中的模拟表型。

BMC Proc. 2014 Jun 17;8(Suppl 1):S2. doi: 10.1186/1753-6561-8-S1-S2. eCollection 2014.

Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.孟德尔分离定律在家族中的价值：数据质量控制、插补及其他。

Genet Epidemiol. 2014 Sep;38 Suppl 1(0 1):S21-8. doi: 10.1002/gepi.21821.

本文引用的文献

BMC Proc. 2014 Jun 17;8(Suppl 1):S2. doi: 10.1186/1753-6561-8-S1-S2. eCollection 2014.

A ν-support vector regression based approach for predicting imputation quality.一种基于ν支持向量回归的插补质量预测方法。

BMC Proc. 2012 Nov 13;6 Suppl 7(Suppl 7):S3. doi: 10.1186/1753-6561-6-S7-S3.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH：利用序列和基因型数据来估计单倍型和未观测基因型。

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

A new statistic to evaluate imputation reliability.一种评估插补可靠性的新统计量。

PLoS One. 2010 Mar 15;5(3):e9697. doi: 10.1371/journal.pone.0009697.

A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.针对三联体和无关个体的大型数据集进行基因型填充和单倍型相位推断的统一方法。

Am J Hum Genet. 2009 Feb;84(2):210-23. doi: 10.1016/j.ajhg.2009.01.005. Epub 2009 Feb 5.

In silico method for inferring genotypes in pedigrees.推断系谱中基因型的计算机模拟方法。

Nat Genet. 2006 Sep;38(9):1002-4. doi: 10.1038/ng1863. Epub 2006 Aug 20.

Merlin--rapid analysis of dense genetic maps using sparse gene flow trees.Merlin——利用稀疏基因流树对密集遗传图谱进行快速分析。

Nat Genet. 2002 Jan;30(1):97-101. doi: 10.1038/ng786. Epub 2001 Dec 3.

Genetic and environmental contributions to cardiovascular risk factors in Mexican Americans. The San Antonio Family Heart Study.墨西哥裔美国人心血管危险因素的遗传和环境影响。圣安东尼奥家族心脏研究。

Circulation. 1996 Nov 1;94(9):2159-70. doi: 10.1161/01.cir.94.9.2159.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验