Department of Genetics, Evolution and Environment, UCL Genetics Institute, University College London, London, UK.
Department of Metabolism, Digestion and Reproduction, Section of Genetics and Genomics, London, UK.
Genome Biol. 2024 Jan 3;25(1):7. doi: 10.1186/s13059-023-03140-3.
The cost-free increase in statistical power of using imputation to infer missing genotypes is undoubtedly appealing, but is it hazard-free? This case study of three type-2 diabetes (T2D) loci demonstrates that it is not; it sheds light on why this is so and raises concerns as to the shortcomings of imputation at disease loci, where haplotypes differ between cases and reference panel.
T2D-associated variants were previously identified using targeted sequencing. We removed these significantly associated SNPs and used neighbouring SNPs to infer them by imputation. We compared imputed with observed genotypes, examined the altered pattern of T2D-SNP association, and investigated the cause of imputation errors by studying haplotype structure. Most T2D variants were incorrectly imputed with a low density of scaffold SNPs, but the majority failed to impute even at high density, despite obtaining high certainty scores. Missing and discordant imputation errors, which were observed disproportionately for the risk alleles, produced monomorphic genotype calls or false-negative associations. We show that haplotypes carrying risk alleles are considerably more common in the T2D cases than the reference panel, for all loci.
Imputation is not a panacea for fine mapping, nor for meta-analysing multiple GWAS based on different arrays and different populations. A total of 80% of the SNPs we have tested are not included in array platforms, explaining why these and other such associated variants may previously have been missed. Regardless of the choice of software and reference haplotypes, imputation drives genotype inference towards the reference panel, introducing errors at disease loci.
使用插补推断缺失基因型的免费统计能力的增加无疑是吸引人的,但它是否没有风险?这项针对三个 2 型糖尿病(T2D)基因座的案例研究表明并非如此;它揭示了为什么会这样,并对在疾病基因座中使用插补的缺陷提出了担忧,因为病例和参考面板之间的单倍型不同。
先前使用靶向测序鉴定了与 T2D 相关的变体。我们去除了这些与疾病显著相关的 SNP,并使用相邻的 SNP 通过插补来推断它们。我们比较了推断出的与观察到的基因型,研究了 T2D-SNP 关联的改变模式,并通过研究单倍型结构研究了插补错误的原因。尽管获得了高确定性评分,但大多数 T2D 变体的插补密度较低,大多数变体甚至在高密度下也无法插补。缺失和不一致的插补错误,观察到这些错误不成比例地出现在风险等位基因中,导致单态基因型或假阴性关联。我们表明,对于所有基因座,携带风险等位基因的单倍型在 T2D 病例中比参考面板中更为常见。
插补不是精细映射的万能药,也不是基于不同阵列和不同人群进行多项 GWAS 荟萃分析的万能药。我们测试的 SNPs 中有 80% 没有包含在阵列平台中,这解释了为什么这些和其他相关变体以前可能被遗漏了。无论选择软件和参考单倍型如何,插补都会将基因型推断推向参考面板,在疾病基因座中引入错误。