Suppr超能文献

利用 HapMap3 对低频变异进行推断得益于大型多样的参考集。

Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.

机构信息

Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

出版信息

Eur J Hum Genet. 2011 Jun;19(6):662-6. doi: 10.1038/ejhg.2011.10. Epub 2011 Mar 2.

Abstract

Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow.

摘要

插补允许在低密度数据集推断未观察到的基因型,通常用于测试在标准基因分型芯片(如低频变体)捕获效果不佳的变体中的疾病关联。尽管已经投入大量精力开发最佳插补算法,但对于参考集选择对插补准确性的影响知之甚少。我们评估了增加参考大小和多样性所带来的改进,特别是比较了迄今为止用于插补的 HapMap2 数据集和包含更多来自更多样化人群的新 HapMap3 数据集。我们发现,对于西方欧洲样本的插补,HapMap3 参考集比 HapMap2 提供更准确的插补,具有更好校准的质量分数,并且增加 HapMap3 参考集中包含的人群数量可以进一步提高。改进在低频变体(频率<5%)中最为明显,最大和最多样化的参考集使低频变体的插补准确性接近常见变体的准确性。对于低频变体,参考集多样性可以独立于参考样本大小提高插补准确性。HapMap3 参考集相对于 HapMap2 提供了显著的插补准确性提高,如果需要低频变体的高度准确插补,它们尤其有用。我们的结果表明,尽管 1000 个基因组计划试点项目的样本量不足以可靠地插补低频变体,但主要项目的更大样本量将允许。

相似文献

引用本文的文献

本文引用的文献

3
Genotype imputation.基因型推算
Annu Rev Genomics Hum Genet. 2009;10:387-406. doi: 10.1146/annurev.genom.9.081307.164242.
9
A comprehensive evaluation of SNP genotype imputation.单核苷酸多态性(SNP)基因型填充的综合评估。
Hum Genet. 2009 Mar;125(2):163-71. doi: 10.1007/s00439-008-0606-5. Epub 2008 Dec 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验