利用来自分布式参考面板的多组推算基因型提高关联检验效能。

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.

作者信息

Zhou Wei, Fritsche Lars G, Das Sayantan, Zhang He, Nielsen Jonas B, Holmen Oddgeir L, Chen Jin, Lin Maoxuan, Elvestad Maiken B, Hveem Kristian, Abecasis Goncalo R, Kang Hyun Min, Willer Cristen J

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.

K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway.

出版信息

Genet Epidemiol. 2017 Dec;41(8):744-755. doi: 10.1002/gepi.22067. Epub 2017 Sep 1.

DOI:10.1002/gepi.22067

PMID:28861891

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6324190/

Abstract

The accuracy of genotype imputation depends upon two factors: the sample size of the reference panel and the genetic similarity between the reference panel and the target samples. When multiple reference panels are not consented to combine together, it is unclear how to combine the imputation results to optimize the power of genetic association studies. We compared the accuracy of 9,265 Norwegian genomes imputed from three reference panels-1000 Genomes phase 3 (1000G), Haplotype Reference Consortium (HRC), and a reference panel containing 2,201 Norwegian participants from the population-based Nord Trøndelag Health Study (HUNT) from low-pass genome sequencing. We observed that the population-matched reference panel allowed for imputation of more population-specific variants with lower frequency (minor allele frequency (MAF) between 0.05% and 0.5%). The overall imputation accuracy from the population-specific panel was substantially higher than 1000G and was comparable with HRC, despite HRC being 15-fold larger. These results recapitulate the value of population-specific reference panels for genotype imputation. We also evaluated different strategies to utilize multiple sets of imputed genotypes to increase the power of association studies. We observed that testing association for all variants imputed from any panel results in higher power to detect association than the alternative strategy of including only one version of each genetic variant, selected for having the highest imputation quality metric. This was particularly true for lower frequency variants (MAF < 1%), even after adjusting for the additional multiple testing burden.

摘要

基因型填充的准确性取决于两个因素

参考面板的样本量以及参考面板与目标样本之间的遗传相似性。当不同意将多个参考面板合并在一起时，尚不清楚如何合并填充结果以优化基因关联研究的效能。我们比较了从三个参考面板——千人基因组计划第三阶段（1000G）、单倍型参考联盟（HRC）以及一个包含来自基于人群的北特伦德拉格健康研究（HUNT）的2201名挪威参与者的参考面板（通过低通量基因组测序获得）对9265个挪威基因组进行填充的准确性。我们观察到，与人群匹配的参考面板能够填充更多低频（次要等位基因频率（MAF）在0.05%至0.5%之间）的人群特异性变异。尽管HRC的规模是人群特异性面板的15倍，但人群特异性面板的总体填充准确性显著高于1000G，且与HRC相当。这些结果再次证明了人群特异性参考面板在基因型填充中的价值。我们还评估了利用多组填充基因型来提高关联研究效能的不同策略。我们观察到，对从任何面板填充的所有变异进行关联测试，比仅纳入每个遗传变异的一个版本（选择具有最高填充质量指标的版本）这一替代策略，具有更高的检测关联的效能。对于低频变异（MAF < 1%）尤其如此，即使在调整了额外的多重检验负担之后。

相似文献

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.利用来自分布式参考面板的多组推算基因型提高关联检验效能。

Genet Epidemiol. 2017 Dec;41(8):744-755. doi: 10.1002/gepi.22067. Epub 2017 Sep 1.

Genotype imputation performance of three reference panels using African ancestry individuals.三种参考面板在非洲血统个体中的基因型推断性能。

Hum Genet. 2018 Apr;137(4):281-292. doi: 10.1007/s00439-018-1881-4. Epub 2018 Apr 10.

Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.利用数千个特定研究的全基因组序列进行罕见变异基因型填充：对具有成本效益的研究设计的影响。

Eur J Hum Genet. 2015 Jul;23(7):975-83. doi: 10.1038/ejhg.2014.216. Epub 2014 Oct 8.

Genotype imputation for Han Chinese population using Haplotype Reference Consortium as reference.基于 Haplotype Reference Consortium 进行中国汉族人群基因型推断。

Hum Genet. 2018 Jul;137(6-7):431-436. doi: 10.1007/s00439-018-1894-z. Epub 2018 May 31.

Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations.超过 10 万 NHLBI 转化医学精准医学（TOPMed）联盟全基因组序列的使用提高了混合非裔和西班牙裔/拉丁裔人群中罕见变异关联的推断质量和检测能力。

PLoS Genet. 2019 Dec 23;15(12):e1008500. doi: 10.1371/journal.pgen.1008500. eCollection 2019 Dec.

Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools.混合人群中的罕见变异插补：参考面板和生物信息学工具的比较

Front Genet. 2019 Apr 3;10:239. doi: 10.3389/fgene.2019.00239. eCollection 2019.

Haplotype reference consortium panel: Practical implications of imputations with large reference panels.单倍型参考联盟面板：使用大型参考面板进行插补的实际意义。

Hum Mutat. 2017 Aug;38(8):1025-1032. doi: 10.1002/humu.23247. Epub 2017 Jun 9.

Improving imputation quality in Samoans through the integration of population-specific sequences into existing reference panels.通过将特定人群序列整合到现有参考面板中来提高萨摩亚人的插补质量。

medRxiv. 2023 Oct 31:2023.10.31.23297835. doi: 10.1101/2023.10.31.23297835.

How local reference panels improve imputation in French populations.如何利用本地参考面板提高法国人群中的基因数据填补质量。

Sci Rep. 2024 Jan 3;14(1):370. doi: 10.1038/s41598-023-49931-3.

Investigating the accuracy of imputing autosomal variants in Nellore cattle using the ARS-UCD1.2 assembly of the bovine genome.利用牛基因组的ARS-UCD1.2组装版本研究内洛尔牛常染色体变异的估算准确性。

BMC Genomics. 2020 Nov 10;21(1):772. doi: 10.1186/s12864-020-07184-8.

引用本文的文献

Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels.多祖源参考面板中小遗传背景的基因型推断准确性和质量指标。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad509.

How local reference panels improve imputation in French populations.如何利用本地参考面板提高法国人群中的基因数据填补质量。

Sci Rep. 2024 Jan 3;14(1):370. doi: 10.1038/s41598-023-49931-3.

Construction of a strawberry breeding core collection to capture and exploit genetic variation.构建草莓育种核心种质库以捕获和利用遗传变异。

BMC Genomics. 2023 Dec 5;24(1):740. doi: 10.1186/s12864-023-09824-1.

The HUNT study: A population-based cohort for genetic research.HUNT研究：一项基于人群的基因研究队列。

Cell Genom. 2022 Oct 12;2(10):100193. doi: 10.1016/j.xgen.2022.100193.

Sex-Specific Survival Bias and Interaction Modeling in Coronary Artery Disease Risk Prediction.性别特异性生存偏差与冠状动脉疾病风险预测中的交互作用建模。

Circ Genom Precis Med. 2023 Feb;16(1):e003542. doi: 10.1161/CIRCGEN.121.003542. Epub 2022 Dec 29.

Meta-imputation: An efficient method to combine genotype data after imputation with multiple reference panels.元导入：一种利用多个参考面板对导入后的基因型数据进行合并的有效方法。

Am J Hum Genet. 2022 Jun 2;109(6):1007-1015. doi: 10.1016/j.ajhg.2022.04.002. Epub 2022 May 3.

Taller height and risk of coronary heart disease and cancer: A within-sibship Mendelian randomization study.个子高与冠心病和癌症风险：一项基于同胞对的孟德尔随机化研究。

Elife. 2022 Mar 18;11:e72984. doi: 10.7554/eLife.72984.

Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.用于分析犬低深度测序中导入基因型的最佳实践。

Mamm Genome. 2022 Mar;33(1):213-229. doi: 10.1007/s00335-021-09914-z. Epub 2021 Sep 8.

An imputed whole-genome sequence-based GWAS approach pinpoints causal mutations for complex traits in a specific swine population.一种基于全基因组序列推断的 GWAS 方法，可在特定猪群中确定复杂性状的因果突变。

Sci China Life Sci. 2022 Apr;65(4):781-794. doi: 10.1007/s11427-020-1960-9. Epub 2021 Aug 11.

MEPE loss-of-function variant associates with decreased bone mineral density and increased fracture risk.MEPE 功能丧失变异与骨密度降低和骨折风险增加相关。

Nat Commun. 2020 Oct 23;11(1):4093. doi: 10.1038/s41467-020-17315-0.

本文引用的文献

Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel.使用基于全基因组测序（WGS）的特定人群高覆盖度插补参考面板提高罕见和低频变异的插补准确性。

Eur J Hum Genet. 2017 Jun;25(7):869-876. doi: 10.1038/ejhg.2017.51. Epub 2017 Apr 12.

Genome-wide association studies of autoimmune vitiligo identify 23 new risk loci and highlight key pathways and regulatory variants.自身免疫性白癜风的全基因组关联研究确定了23个新的风险位点，并突出了关键途径和调控变异。

Nat Genet. 2016 Nov;48(11):1418-1424. doi: 10.1038/ng.3680. Epub 2016 Oct 10.

Rare variants in BRCA2 and CHEK2 are associated with the risk of urinary tract cancers.BRCA2 和 CHEK2 中的罕见变异与泌尿系统癌症的风险相关。

Sci Rep. 2016 Sep 16;6:33542. doi: 10.1038/srep33542.

Next-generation genotype imputation service and methods.下一代基因型填充服务和方法。

Nat Genet. 2016 Oct;48(10):1284-1287. doi: 10.1038/ng.3656. Epub 2016 Aug 29.

A reference panel of 64,976 haplotypes for genotype imputation.用于基因型插补的64976个单倍型参考面板。

Nat Genet. 2016 Oct;48(10):1279-83. doi: 10.1038/ng.3643. Epub 2016 Aug 22.

Analysis of protein-coding genetic variation in 60,706 humans.对60706名人类的蛋白质编码基因变异进行分析。

Nature. 2016 Aug 18;536(7616):285-91. doi: 10.1038/nature19057.

Whole-genome sequencing in French Canadians from Quebec.魁北克的法裔加拿大人的全基因组测序。

Hum Genet. 2016 Nov;135(11):1213-1221. doi: 10.1007/s00439-016-1702-6. Epub 2016 Jul 4.

Five endometrial cancer risk loci identified through genome-wide association analysis.通过全基因组关联分析确定的五个子宫内膜癌风险位点。

Nat Genet. 2016 Jun;48(6):667-674. doi: 10.1038/ng.3562. Epub 2016 May 2.

Meta-analysis of 49 549 individuals imputed with the 1000 Genomes Project reveals an exonic damaging variant in ANGPTL4 determining fasting TG levels.对49549名通过千人基因组计划进行基因填充的个体进行的荟萃分析显示，血管生成素样蛋白4（ANGPTL4）中的一个外显子有害变异决定了空腹甘油三酯水平。

J Med Genet. 2016 Jul;53(7):441-9. doi: 10.1136/jmedgenet-2015-103439. Epub 2016 Apr 1.

Genome-wide association analysis identifies novel loci for chronotype in 100,420 individuals from the UK Biobank.全基因组关联分析在来自英国生物银行的100420名个体中确定了与昼夜节律类型相关的新基因座。

Nat Commun. 2016 Mar 9;7:10889. doi: 10.1038/ncomms10889.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验