用于推算群体数据中缺失基因型的方法。

Methods to impute missing genotypes for population data.

作者信息

Yu Zhaoxia, Schaid Daniel J

机构信息

Department of Statistics, University of California, Irvine, CA 92697, USA.

出版信息

Hum Genet. 2007 Dec;122(5):495-504. doi: 10.1007/s00439-007-0427-y. Epub 2007 Sep 13.

DOI:10.1007/s00439-007-0427-y

PMID:17851696

Abstract

For large-scale genotyping studies, it is common for most subjects to have some missing genetic markers, even if the missing rate per marker is low. This compromises association analyses, with varying numbers of subjects contributing to analyses when performing single-marker or multi-marker analyses. In this paper, we consider eight methods to infer missing genotypes, including two haplotype reconstruction methods (local expectation maximization-EM, and fastPHASE), two k-nearest neighbor methods (original k-nearest neighbor, KNN, and a weighted k-nearest neighbor, wtKNN), three linear regression methods (backward variable selection, LM.back, least angle regression, LM.lars, and singular value decomposition, LM.svd), and a regression tree, Rtree. We evaluate the accuracy of them using single nucleotide polymorphism (SNP) data from the HapMap project, under a variety of conditions and parameters. We find that fastPHASE has the lowest error rates across different analysis panels and marker densities. LM.lars gives slightly less accurate estimate of missing genotypes than fastPHASE, but has better performance than the other methods.

摘要

对于大规模基因分型研究而言，即便每个标记的缺失率很低，大多数受试者存在一些缺失的遗传标记也是很常见的。这会影响关联分析，在进行单标记或多标记分析时，参与分析的受试者数量各不相同。在本文中，我们考虑了八种推断缺失基因型的方法，包括两种单倍型重建方法（局部期望最大化 - EM 和 fastPHASE）、两种 k 近邻方法（原始 k 近邻，KNN，以及加权 k 近邻，wtKNN）、三种线性回归方法（向后变量选择，LM.back、最小角回归，LM.lars 和奇异值分解，LM.svd）以及一种回归树，Rtree。我们在各种条件和参数下，使用来自 HapMap 项目的单核苷酸多态性（SNP）数据评估了它们的准确性。我们发现，在不同的分析面板和标记密度下，fastPHASE 的错误率最低。LM.lars 对缺失基因型的估计准确性略低于 fastPHASE，但比其他方法表现更好。

相似文献

Methods to impute missing genotypes for population data.

Hum Genet. 2007 Dec;122(5):495-504. doi: 10.1007/s00439-007-0427-y. Epub 2007 Sep 13.

The impact of missing and erroneous genotypes on tagging SNP selection and power of subsequent association tests.

Hum Hered. 2006;61(1):31-44. doi: 10.1159/000092141. Epub 2006 Mar 23.

The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data.

Genetics. 2010 Aug;185(4):1441-9. doi: 10.1534/genetics.110.113936. Epub 2010 May 17.

Quantifying the amount of missing information in genetic association studies.

Genet Epidemiol. 2006 Dec;30(8):703-17. doi: 10.1002/gepi.20181.

Imputing missing genotypic data of single-nucleotide polymorphisms using neural networks.

Eur J Hum Genet. 2008 Apr;16(4):487-95. doi: 10.1038/sj.ejhg.5201988. Epub 2008 Jan 16.

Fast accurate missing SNP genotype local imputation.

BMC Res Notes. 2012 Aug 3;5:404. doi: 10.1186/1756-0500-5-404.

Examining the effect of linkage disequilibrium between markers on the Type I error rate and power of nonparametric multipoint linkage analysis of two-generation and multigenerational pedigrees in the presence of missing genotype data.

Genet Epidemiol. 2008 Jan;32(1):41-51. doi: 10.1002/gepi.20260.

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Am J Hum Genet. 2006 Apr;78(4):629-44. doi: 10.1086/502802. Epub 2006 Feb 17.

Imputation of missing single nucleotide polymorphism genotypes using a multivariate mixed model framework.

J Anim Sci. 2011 Jul;89(7):2042-9. doi: 10.2527/jas.2010-3297. Epub 2011 Feb 25.

Imputation of unordered markers and the impact on genomic selection accuracy.

G3 (Bethesda). 2013 Mar;3(3):427-39. doi: 10.1534/g3.112.005363. Epub 2013 Mar 1.

引用本文的文献

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation.

Cell Syst. 2021 Nov 17;12(11):1108-1120.e4. doi: 10.1016/j.cels.2021.07.010. Epub 2021 Aug 30.

Structured Matrix Completion with Applications to Genomic Data Integration.

J Am Stat Assoc. 2016;111(514):621-633. doi: 10.1080/01621459.2015.1021005. Epub 2016 Aug 18.

Genome-wide association analysis identifies novel loci for chronotype in 100,420 individuals from the UK Biobank.

Nat Commun. 2016 Mar 9;7:10889. doi: 10.1038/ncomms10889.

Whole genome SNP genotype piecemeal imputation.

BMC Bioinformatics. 2015 Oct 23;16:340. doi: 10.1186/s12859-015-0770-2.

Lung Cancer Risk Prediction Using Common SNPs Located in GWAS-Identified Susceptibility Regions.

J Thorac Oncol. 2015 Nov;10(11):1538-45. doi: 10.1097/JTO.0000000000000666.

Recombination locations and rates in beef cattle assessed from parent-offspring pairs.

Genet Sel Evol. 2014 May 29;46(1):34. doi: 10.1186/1297-9686-46-34.

Windfalls and pitfalls: Applications of population genetics to the search for disease genes.

Evol Med Public Health. 2013 Jan;2013(1):254-72. doi: 10.1093/emph/eot021. Epub 2013 Nov 6.

Coverage and efficiency in current SNP chips.

Eur J Hum Genet. 2014 Sep;22(9):1124-30. doi: 10.1038/ejhg.2013.304. Epub 2014 Jan 22.

Genotype imputation via matrix completion.

Genome Res. 2013 Mar;23(3):509-18. doi: 10.1101/gr.145821.112. Epub 2012 Dec 10.

Comparison of different imputation methods from low- to high-density panels using Chinese Holstein cattle.

Animal. 2013 May;7(5):729-35. doi: 10.1017/S1751731112002224. Epub 2012 Dec 11.

本文引用的文献

Imputation-based analysis of association studies: candidate regions and quantitative traits.

PLoS Genet. 2007 Jul;3(7):e114. doi: 10.1371/journal.pgen.0030114. Epub 2007 May 30.

A new multipoint method for genome-wide association studies by imputation of genotypes.

Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.

The Interaction of Selection and Linkage. I. General Considerations; Heterotic Models.

Genetics. 1964 Jan;49(1):49-67. doi: 10.1093/genetics/49.1.49.

Imputation methods to improve inference in SNP association studies.

Genet Epidemiol. 2006 Dec;30(8):690-702. doi: 10.1002/gepi.20180.

Testing untyped alleles (TUNA)-applications to genome-wide association studies.

Genet Epidemiol. 2006 Dec;30(8):718-27. doi: 10.1002/gepi.20182.

Principal components analysis corrects for stratification in genome-wide association studies.

Nat Genet. 2006 Aug;38(8):904-9. doi: 10.1038/ng1847. Epub 2006 Jul 23.

Multiple imputation of missing genotype data for unrelated individuals.

Ann Hum Genet. 2006 May;70(Pt 3):372-81. doi: 10.1111/j.1529-8817.2005.00236.x.

Bayesian mapping of genotype x expression interactions in quantitative and qualitative traits.

Heredity (Edinb). 2006 Jul;97(1):4-18. doi: 10.1038/sj.hdy.6800817. Epub 2006 May 3.

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.

Am J Hum Genet. 2006 Apr;78(4):629-44. doi: 10.1086/502802. Epub 2006 Feb 17.

Haplotype analysis in the presence of informatively missing genotype data.

Genet Epidemiol. 2006 May;30(4):290-300. doi: 10.1002/gepi.20144.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于推算群体数据中缺失基因型的方法。

Methods to impute missing genotypes for population data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献