当映射疾病易感性变异时，基因型推断的危害。

The hazards of genotype imputation when mapping disease susceptibility variants.

机构信息

Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK.

Department of Metabolism, Digestion and Reproduction, Section of Genetics and Genomics, London, UK.

出版信息

Genome Biol. 2024 Jan 3;25(1):7. doi: 10.1186/s13059-023-03140-3.

DOI:10.1186/s13059-023-03140-3

PMID:38172955

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10763476/

Abstract

BACKGROUND

The cost-free increase in statistical power of using imputation to infer missing genotypes is undoubtedly appealing, but is it hazard-free? This case study of three type-2 diabetes (T2D) loci demonstrates that it is not; it sheds light on why this is so and raises concerns as to the shortcomings of imputation at disease loci, where haplotypes differ between cases and reference panel.

RESULTS

T2D-associated variants were previously identified using targeted sequencing. We removed these significantly associated SNPs and used neighbouring SNPs to infer them by imputation. We compared imputed with observed genotypes, examined the altered pattern of T2D-SNP association, and investigated the cause of imputation errors by studying haplotype structure. Most T2D variants were incorrectly imputed with a low density of scaffold SNPs, but the majority failed to impute even at high density, despite obtaining high certainty scores. Missing and discordant imputation errors, which were observed disproportionately for the risk alleles, produced monomorphic genotype calls or false-negative associations. We show that haplotypes carrying risk alleles are considerably more common in the T2D cases than the reference panel, for all loci.

CONCLUSIONS

Imputation is not a panacea for fine mapping, nor for meta-analysing multiple GWAS based on different arrays and different populations. A total of 80% of the SNPs we have tested are not included in array platforms, explaining why these and other such associated variants may previously have been missed. Regardless of the choice of software and reference haplotypes, imputation drives genotype inference towards the reference panel, introducing errors at disease loci.

摘要

背景

使用插补推断缺失基因型的免费统计能力的增加无疑是吸引人的，但它是否没有风险？这项针对三个 2 型糖尿病（T2D）基因座的案例研究表明并非如此；它揭示了为什么会这样，并对在疾病基因座中使用插补的缺陷提出了担忧，因为病例和参考面板之间的单倍型不同。

结果

先前使用靶向测序鉴定了与 T2D 相关的变体。我们去除了这些与疾病显著相关的 SNP，并使用相邻的 SNP 通过插补来推断它们。我们比较了推断出的与观察到的基因型，研究了 T2D-SNP 关联的改变模式，并通过研究单倍型结构研究了插补错误的原因。尽管获得了高确定性评分，但大多数 T2D 变体的插补密度较低，大多数变体甚至在高密度下也无法插补。缺失和不一致的插补错误，观察到这些错误不成比例地出现在风险等位基因中，导致单态基因型或假阴性关联。我们表明，对于所有基因座，携带风险等位基因的单倍型在 T2D 病例中比参考面板中更为常见。

结论

插补不是精细映射的万能药，也不是基于不同阵列和不同人群进行多项 GWAS 荟萃分析的万能药。我们测试的 SNPs 中有 80% 没有包含在阵列平台中，这解释了为什么这些和其他相关变体以前可能被遗漏了。无论选择软件和参考单倍型如何，插补都会将基因型推断推向参考面板，在疾病基因座中引入错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8979/10763476/5f29975450f1/13059_2023_3140_Fig1_HTML.jpg

相似文献

The hazards of genotype imputation when mapping disease susceptibility variants.

Genome Biol. 2024 Jan 3;25(1):7. doi: 10.1186/s13059-023-03140-3.

Concordance rate between copy number variants detected using either high- or medium-density single nucleotide polymorphism genotype panels and the potential of imputing copy number variants from flanking high density single nucleotide polymorphism haplotypes in cattle.

BMC Genomics. 2020 Mar 4;21(1):205. doi: 10.1186/s12864-020-6627-8.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

Genotype imputation of Metabochip SNPs using a study-specific reference panel of ~4,000 haplotypes in African Americans from the Women's Health Initiative.

Genet Epidemiol. 2012 Feb;36(2):107-17. doi: 10.1002/gepi.21603.

Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes.

J Hum Genet. 2024 Oct;69(10):511-518. doi: 10.1038/s10038-024-01261-6. Epub 2024 Jun 25.

Joint effects of known type 2 diabetes susceptibility loci in genome-wide association study of Singapore Chinese: the Singapore Chinese health study.

PLoS One. 2014 Feb 10;9(2):e87762. doi: 10.1371/journal.pone.0087762. eCollection 2014.

Examining the Impact of Imputation Errors on Fine-Mapping Using DNA Methylation QTL as a Model Trait.

Genetics. 2019 Jul;212(3):577-586. doi: 10.1534/genetics.118.301861. Epub 2019 Apr 30.

Genotype Imputation in Winter Wheat Using First-Generation Haplotype Map SNPs Improves Genome-Wide Association Mapping and Genomic Prediction of Traits.

G3 (Bethesda). 2019 Jan 9;9(1):125-133. doi: 10.1534/g3.118.200664.

Accurate genotype imputation from low-coverage whole-genome sequencing data of rainbow trout.

G3 (Bethesda). 2024 Sep 4;14(9). doi: 10.1093/g3journal/jkae168.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

本文引用的文献

Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing.

Am J Hum Genet. 2022 Sep 1;109(9):1653-1666. doi: 10.1016/j.ajhg.2022.07.012. Epub 2022 Aug 17.

Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation.

Nat Genet. 2022 May;54(5):560-572. doi: 10.1038/s41588-022-01058-3. Epub 2022 May 12.

The hazards of genotype imputation in chromosomal regions under selection: A case study using the Lactase gene region.

Ann Hum Genet. 2022 Jan;86(1):24-33. doi: 10.1111/ahg.12444. Epub 2021 Sep 15.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Nature. 2021 Feb;590(7845):290-299. doi: 10.1038/s41586-021-03205-y. Epub 2021 Feb 10.

Genome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations.

Nat Commun. 2021 Feb 8;12(1):860. doi: 10.1038/s41467-021-21001-0.

Efficient phasing and imputation of low-coverage sequencing data using large reference panels.

Nat Genet. 2021 Jan;53(1):120-126. doi: 10.1038/s41588-020-00756-0. Epub 2021 Jan 7.

Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis.

Nat Genet. 2020 Jul;52(7):680-691. doi: 10.1038/s41588-020-0637-y. Epub 2020 Jun 15.

Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores.

Genome Med. 2019 Nov 26;11(1):74. doi: 10.1186/s13073-019-0682-2.

Comprehensive Assessment of Genotype Imputation Performance.

Hum Hered. 2018;83(3):107-116. doi: 10.1159/000489758. Epub 2019 Jan 22.

Genome-wide association meta-analysis highlights light-induced signaling as a driver for refractive error.

Nat Genet. 2018 Jun;50(6):834-848. doi: 10.1038/s41588-018-0127-7. Epub 2018 May 28.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

当映射疾病易感性变异时，基因型推断的危害。

The hazards of genotype imputation when mapping disease susceptibility variants.

机构信息

Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK.

Department of Metabolism, Digestion and Reproduction, Section of Genetics and Genomics, London, UK.