Suppr超能文献

通过调整补充数据改进特定人群的等位基因频率估计:一种经验贝叶斯方法。

IMPROVING POPULATION-SPECIFIC ALLELE FREQUENCY ESTIMATES BY ADAPTING SUPPLEMENTAL DATA: AN EMPIRICAL BAYES APPROACH.

作者信息

Coram Marc, Tang Hua

机构信息

Department of Health Research and Policy, Stanford University, Stanford, California 94305, USA.

出版信息

Ann Appl Stat. 2007 Dec 12;1(2):459-479. doi: 10.1214/07-aoas121.

Abstract

Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.

摘要

估计遗传标记的等位基因频率是生物学和生物医学研究中的关键要素,例如在人类遗传变异研究或可遗传性状的遗传病因学研究中。随着遗传数据越来越容易获取,研究人员面临一个困境:何时应将其他研究和人群亚组的数据与主要数据合并?合并额外的样本通常会降低频率估计值的方差;然而,如果使用不当,由于群体分层,合并估计值可能会出现严重偏差。由于存在这种潜在偏差,大多数研究人员避免合并,即使是对于具有相同种族背景且居住在同一大陆的样本也是如此。在此,我们提出一种经验贝叶斯方法来估计单核苷酸多态性的等位基因频率。该程序会自适应地纳入相关样本的基因型,从而使更相似的样本对估计值有更大的影响。在我们考虑的每个例子中,我们的估计器实现的均方误差(MSE)比合并或不合并的情况都要小,有时在两种极端情况下都有显著改善。如一项与实际数据示例仔细匹配的模拟研究所示,引入的偏差很小。当对一小群个体进行大量标记的基因分型时,我们的方法特别有用,这种情况在全基因组关联研究中很可能会遇到。

相似文献

5
Impact and quantification of the sources of error in DNA pooling designs.DNA混合设计中误差来源的影响及量化
Ann Hum Genet. 2009 Jan;73(1):118-24. doi: 10.1111/j.1469-1809.2008.00486.x. Epub 2008 Oct 15.

本文引用的文献

1
The genetical structure of populations.种群的遗传结构。
Ann Eugen. 1951 Mar;15(4):323-54. doi: 10.1111/j.1469-1809.1949.tb02451.x.
2
Evolution in Mendelian Populations.孟德尔群体中的进化。
Genetics. 1931 Mar;16(2):97-159. doi: 10.1093/genetics/16.2.97.
3
Positive natural selection in the human lineage.人类谱系中的正向自然选择。
Science. 2006 Jun 16;312(5780):1614-20. doi: 10.1126/science.1124309.
4
A map of recent positive selection in the human genome.人类基因组中近期正选择图谱。
PLoS Biol. 2006 Mar;4(3):e72. doi: 10.1371/journal.pbio.0040072. Epub 2006 Mar 7.
8
A haplotype map of the human genome.人类基因组单倍型图谱。
Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验