Suppr超能文献

用于推算群体数据中缺失基因型的方法。

Methods to impute missing genotypes for population data.

作者信息

Yu Zhaoxia, Schaid Daniel J

机构信息

Department of Statistics, University of California, Irvine, CA 92697, USA.

出版信息

Hum Genet. 2007 Dec;122(5):495-504. doi: 10.1007/s00439-007-0427-y. Epub 2007 Sep 13.

Abstract

For large-scale genotyping studies, it is common for most subjects to have some missing genetic markers, even if the missing rate per marker is low. This compromises association analyses, with varying numbers of subjects contributing to analyses when performing single-marker or multi-marker analyses. In this paper, we consider eight methods to infer missing genotypes, including two haplotype reconstruction methods (local expectation maximization-EM, and fastPHASE), two k-nearest neighbor methods (original k-nearest neighbor, KNN, and a weighted k-nearest neighbor, wtKNN), three linear regression methods (backward variable selection, LM.back, least angle regression, LM.lars, and singular value decomposition, LM.svd), and a regression tree, Rtree. We evaluate the accuracy of them using single nucleotide polymorphism (SNP) data from the HapMap project, under a variety of conditions and parameters. We find that fastPHASE has the lowest error rates across different analysis panels and marker densities. LM.lars gives slightly less accurate estimate of missing genotypes than fastPHASE, but has better performance than the other methods.

摘要

对于大规模基因分型研究而言,即便每个标记的缺失率很低,大多数受试者存在一些缺失的遗传标记也是很常见的。这会影响关联分析,在进行单标记或多标记分析时,参与分析的受试者数量各不相同。在本文中,我们考虑了八种推断缺失基因型的方法,包括两种单倍型重建方法(局部期望最大化 - EM 和 fastPHASE)、两种 k 近邻方法(原始 k 近邻,KNN,以及加权 k 近邻,wtKNN)、三种线性回归方法(向后变量选择,LM.back、最小角回归,LM.lars 和奇异值分解,LM.svd)以及一种回归树,Rtree。我们在各种条件和参数下,使用来自 HapMap 项目的单核苷酸多态性(SNP)数据评估了它们的准确性。我们发现,在不同的分析面板和标记密度下,fastPHASE 的错误率最低。LM.lars 对缺失基因型的估计准确性略低于 fastPHASE,但比其他方法表现更好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验