全球人类群体中的基因型推断准确性。

Genotype-imputation accuracy across worldwide human populations.

作者信息

Huang Lucy, Li Yun, Singleton Andrew B, Hardy John A, Abecasis Gonçalo, Rosenberg Noah A, Scheet Paul

机构信息

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.

出版信息

Am J Hum Genet. 2009 Feb;84(2):235-50. doi: 10.1016/j.ajhg.2009.01.013.

DOI:10.1016/j.ajhg.2009.01.013

PMID:19215730

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2668016/

Abstract

A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the "portability" of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified "optimal" mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial "SNP chip," again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations.

摘要

在全基因组关联（GWA）研究中，一种用于绘制复杂疾病易感基因座的当前方法涉及利用密集基因型数据参考数据库中的信息。通过对参考面板中的连锁不平衡模式进行建模，可以估算出研究样本中未直接测量的基因型，并对其进行疾病关联测试。这种估算策略在现有参考面板能够很好代表的人群的GWA研究中取得了成功。我们使用了来自29个全球人群的443名无亲缘关系个体中513,008个常染色体单核苷酸多态性（SNP）位点的基因型，来评估HapMap参考面板在不同人群研究中用于估算的“可移植性”。当使用单个HapMap面板对随机屏蔽的基因型进行估算时，欧洲人群的估算准确性最高，其次是东亚、中亚和南亚、美洲、大洋洲、中东和非洲的人群。对于每个人群，我们确定了能使估算准确性最大化的参考面板“最佳”组合，并且我们发现，在大多数人群中，包含至少两个HapMap面板个体的组合产生了最高的估算准确性。通过对在相同样本中分型的其他SNP进行单独调查，我们评估了在给定SNP位置所有基因型均未观察到并基于商业“ SNP芯片”数据进行估算的情况下的估算准确性，再次发现大多数人群受益于使用两个或更多HapMap参考面板的组合。我们的结果可以作为在不同人群中为基于估算的GWA分析选择合适参考面板的指南。

相似文献

Genotype-imputation accuracy across worldwide human populations.全球人类群体中的基因型推断准确性。

Am J Hum Genet. 2009 Feb;84(2):235-50. doi: 10.1016/j.ajhg.2009.01.013.

Haplotype variation and genotype imputation in African populations.非洲人群中的单体型变异和基因型推断。

Genet Epidemiol. 2011 Dec;35(8):766-80. doi: 10.1002/gepi.20626.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome.东南亚人群中基因型推断的验证及单核苷酸多态性注释对推断结果的影响。

BMC Med Genet. 2018 Feb 13;19(1):23. doi: 10.1186/s12881-018-0534-8.

Comprehensive evaluation of imputation performance in African Americans.对非裔美国人插补性能的综合评估。

J Hum Genet. 2012 Jul;57(7):411-21. doi: 10.1038/jhg.2012.43. Epub 2012 May 31.

Founder population-specific HapMap panel increases power in GWA studies through improved imputation accuracy and CNV tagging.发现人群特异性 HapMap 面板通过提高 imputation 准确性和 CNV 标记增加了 GWA 研究的效能。

Genome Res. 2010 Oct;20(10):1344-51. doi: 10.1101/gr.106534.110. Epub 2010 Sep 1.

A generic coalescent-based framework for the selection of a reference panel for imputation.基于泛凝聚的参考面板选择方法用于 imputation。

Genet Epidemiol. 2010 Dec;34(8):773-82. doi: 10.1002/gepi.20505.

Assessing accuracy of genotype imputation in American Indians.评估美洲印第安人群中基因型填充的准确性。

PLoS One. 2014 Jul 11;9(7):e102544. doi: 10.1371/journal.pone.0102544. eCollection 2014.

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.利用来自分布式参考面板的多组推算基因型提高关联检验效能。

Genet Epidemiol. 2017 Dec;41(8):744-755. doi: 10.1002/gepi.22067. Epub 2017 Sep 1.

Genotype imputation performance of three reference panels using African ancestry individuals.三种参考面板在非洲血统个体中的基因型推断性能。

Hum Genet. 2018 Apr;137(4):281-292. doi: 10.1007/s00439-018-1881-4. Epub 2018 Apr 10.

引用本文的文献

Methodological opportunities in genomic data analysis to advance health equity.基因组数据分析中促进健康公平的方法学机遇。

Nat Rev Genet. 2025 May 15. doi: 10.1038/s41576-025-00839-w.

MultiCook: A Tool That Improves Accuracy of HLA Imputation by Combining Probabilities From Multiple Reference Panels and Methods.MultiCook：一种通过结合来自多个参考面板和方法的概率来提高HLA基因分型准确性的工具。

HLA. 2025 May;105(5):e70153. doi: 10.1111/tan.70153.

Common variation in meiosis genes shapes human recombination phenotypes and aneuploidy risk.减数分裂基因的常见变异塑造了人类重组表型和非整倍体风险。

medRxiv. 2025 Apr 4:2025.04.02.25325097. doi: 10.1101/2025.04.02.25325097.

Sequencing whole genomes of the West Javanese population in Indonesia reveals novel variants and improves imputation accuracy.对印度尼西亚西爪哇人群的全基因组进行测序揭示了新的变异并提高了基因填充准确性。

Front Genet. 2025 Feb 7;15:1492602. doi: 10.3389/fgene.2024.1492602. eCollection 2024.

A genotype imputation reference panel specific for native Southeast Asian populations.一个专门针对东南亚本土人群的基因型填充参考面板。

NPJ Genom Med. 2024 Oct 5;9(1):47. doi: 10.1038/s41525-024-00435-7.

Commonly used genomic arrays may lose information due to imperfect coverage of discovered variants for autism spectrum disorder.常用的基因组芯片可能会因为对自闭症谱系障碍发现的变异覆盖不完美而丢失信息。

J Neurodev Disord. 2024 Sep 12;16(1):54. doi: 10.1186/s11689-024-09571-8.

bioRxiv. 2024 Jun 14:2024.06.14.598981. doi: 10.1101/2024.06.14.598981.

Imputation accuracy across global human populations.全球人类群体的插补准确性。

Am J Hum Genet. 2024 May 2;111(5):979-989. doi: 10.1016/j.ajhg.2024.03.011. Epub 2024 Apr 10.

Solving the Arizona search problem by imputation.通过插补解决亚利桑那州搜索问题。

iScience. 2024 Jan 12;27(2):108831. doi: 10.1016/j.isci.2024.108831. eCollection 2024 Feb 16.

Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels.多祖源参考面板中小遗传背景的基因型推断准确性和质量指标。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad509.

本文引用的文献

Practical issues in imputation-based association mapping.基于插补的关联映射中的实际问题。

PLoS Genet. 2008 Dec;4(12):e1000279. doi: 10.1371/journal.pgen.1000279. Epub 2008 Dec 5.

Analyses and comparison of accuracy of different genotype imputation methods.不同基因型填充方法准确性的分析与比较。

PLoS One. 2008;3(10):e3551. doi: 10.1371/journal.pone.0003551. Epub 2008 Oct 29.

Missing data imputation and haplotype phase inference for genome-wide association studies.全基因组关联研究中的缺失数据插补与单倍型相位推断

Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11.

Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease.全基因组关联研究确定了30多个克罗恩病的不同易感基因座。

Nat Genet. 2008 Aug;40(8):955-62. doi: 10.1038/ng.175. Epub 2008 Jun 29.

Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India.利用群体混合来优化基因组数据库的效用：印度的连锁不平衡与关联研究设计

Ann Hum Genet. 2008 Jul;72(Pt 4):535-46. doi: 10.1111/j.1469-1809.2008.00457.x. Epub 2007 May 30.

HapMap tagSNP transferability in multiple populations: general guidelines.多群体中HapMap标签单核苷酸多态性的可转移性：通用指南

Genomics. 2008 Jul;92(1):41-51. doi: 10.1016/j.ygeno.2008.03.011. Epub 2008 May 14.

Common variants near MC4R are associated with fat mass, weight and risk of obesity.MC4R基因附近的常见变异与脂肪量、体重及肥胖风险相关。

Nat Genet. 2008 Jun;40(6):768-75. doi: 10.1038/ng.140. Epub 2008 May 4.

Polymorphisms of the HNF1A gene encoding hepatocyte nuclear factor-1 alpha are associated with C-reactive protein.编码肝细胞核因子-1α的HNF1A基因多态性与C反应蛋白相关。

Am J Hum Genet. 2008 May;82(5):1193-201. doi: 10.1016/j.ajhg.2008.03.017. Epub 2008 Apr 24.

Identification of ten loci associated with height highlights new biological pathways in human growth.与身高相关的十个基因座的鉴定揭示了人类生长中的新生物学途径。

Nat Genet. 2008 May;40(5):584-91. doi: 10.1038/ng.125. Epub 2008 Apr 6.

Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes.全基因组关联数据的荟萃分析及大规模重复研究确定了2型糖尿病的其他易感基因座。

Nat Genet. 2008 May;40(5):638-45. doi: 10.1038/ng.120. Epub 2008 Mar 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验