基于预过滤单核苷酸多态性（SNP）对基因型推断结果的影响

Impact of pre-imputation SNP-filtering on genotype imputation results.

出版信息

BMC Genet. 2014 Aug 12;15:88. doi: 10.1186/s12863-014-0088-5.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4236550/

Abstract

BACKGROUND

Imputation of partially missing or unobserved genotypes is an indispensable tool for SNP data analyses. However, research and understanding of the impact of initial SNP-data quality control on imputation results is still limited. In this paper, we aim to evaluate the effect of different strategies of pre-imputation quality filtering on the performance of the widely used imputation algorithms MaCH and IMPUTE.

RESULTS

We considered three scenarios: imputation of partially missing genotypes with usage of an external reference panel, without usage of an external reference panel, as well as imputation of completely un-typed SNPs using an external reference panel. We first created various datasets applying different SNP quality filters and masking certain percentages of randomly selected high-quality SNPs. We imputed these SNPs and compared the results between the different filtering scenarios by using established and newly proposed measures of imputation quality. While the established measures assess certainty of imputation results, our newly proposed measures focus on the agreement with true genotypes. These measures showed that pre-imputation SNP-filtering might be detrimental regarding imputation quality. Moreover, the strongest drivers of imputation quality were in general the burden of missingness and the number of SNPs used for imputation. We also found that using a reference panel always improves imputation quality of partially missing genotypes. MaCH performed slightly better than IMPUTE2 in most of our scenarios. Again, these results were more pronounced when using our newly defined measures of imputation quality.

CONCLUSION

Even a moderate filtering has a detrimental effect on the imputation quality. Therefore little or no SNP filtering prior to imputation appears to be the best strategy for imputing small to moderately sized datasets. Our results also showed that for these datasets, MaCH performs slightly better than IMPUTE2 in most scenarios at the cost of increased computing time.

摘要

背景

部分缺失或未观测基因型的插补是 SNP 数据分析不可或缺的工具。然而，对于初始 SNP 数据质量控制对插补结果的影响的研究和理解仍然有限。在本文中，我们旨在评估不同的预插补质量过滤策略对广泛使用的插补算法 MaCH 和 IMPUTE 的性能的影响。

结果

我们考虑了三种情况：使用外部参考面板插补部分缺失基因型、不使用外部参考面板插补部分缺失基因型，以及使用外部参考面板插补完全未分型的 SNPs。我们首先创建了各种数据集，应用不同的 SNP 质量过滤并屏蔽一定比例的随机选择的高质量 SNPs。我们对这些 SNPs 进行了插补，并通过使用已建立和新提出的插补质量度量来比较不同过滤方案之间的结果。虽然已建立的度量评估插补结果的确定性，但我们新提出的度量侧重于与真实基因型的一致性。这些度量表明，预插补 SNP 过滤可能会对插补质量产生不利影响。此外，插补质量的最强驱动因素通常是缺失率和用于插补的 SNPs 数量。我们还发现，使用参考面板总是可以提高部分缺失基因型的插补质量。在我们的大多数场景中，MaCH 的表现略优于 IMPUTE2。同样，当使用我们新定义的插补质量度量时，这些结果更为明显。

结论

即使是适度的过滤也会对插补质量产生不利影响。因此，在插补之前，对小数据集或中等大小的数据集进行少量或不进行 SNP 过滤似乎是最佳策略。我们的结果还表明，对于这些数据集，MaCH 在大多数情况下的表现略优于 IMPUTE2，但代价是计算时间增加。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61e3/4236550/eebb58fd4c8f/s12863-014-0088-5-1.jpg

相似文献

Impact of pre-imputation SNP-filtering on genotype imputation results.基于预过滤单核苷酸多态性（SNP）对基因型推断结果的影响

BMC Genet. 2014 Aug 12;15:88. doi: 10.1186/s12863-014-0088-5.

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。

BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

Comprehensive Assessment of Genotype Imputation Performance.基因型填充性能的综合评估

Hum Hered. 2018;83(3):107-116. doi: 10.1159/000489758. Epub 2019 Jan 22.

Comparing performance of modern genotype imputation methods in different ethnicities.比较不同族群中现代基因型推断方法的性能。

Sci Rep. 2016 Oct 4;6:34386. doi: 10.1038/srep34386.

Comprehensive evaluation of imputation performance in African Americans.对非裔美国人插补性能的综合评估。

J Hum Genet. 2012 Jul;57(7):411-21. doi: 10.1038/jhg.2012.43. Epub 2012 May 31.

Impact of genetic similarity on imputation accuracy.基因相似性对插补准确性的影响。

BMC Genet. 2015 Jul 22;16:90. doi: 10.1186/s12863-015-0248-2.

Accuracy of genotype imputation in sheep breeds.绵羊品种基因型推断的准确性。

Anim Genet. 2012 Feb;43(1):72-80. doi: 10.1111/j.1365-2052.2011.02208.x. Epub 2011 May 27.

Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.用于澳大利亚主要绵羊品种的低密度单核苷酸多态性（SNP）芯片设计及其对填充和基因组预测准确性的影响。

Anim Genet. 2015 Oct;46(5):544-56. doi: 10.1111/age.12340. Epub 2015 Sep 11.

Using family-based imputation in genome-wide association studies with large complex pedigrees: the Framingham Heart Study.在具有大型复杂家系的全基因组关联研究中使用基于家系的内插法：弗雷明汉心脏研究。

PLoS One. 2012;7(12):e51589. doi: 10.1371/journal.pone.0051589. Epub 2012 Dec 17.

Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.评估低深度简化基因组测序（GBS）数据的插补算法

PLoS One. 2016 Aug 18;11(8):e0160733. doi: 10.1371/journal.pone.0160733. eCollection 2016.

引用本文的文献

Genome-wide association studies revealed partial genetic links between early vigour and precocity in macadamia.全基因组关联研究揭示了澳洲坚果早期活力与早熟之间的部分遗传联系。

Hortic Res. 2025 Jul 4;12(9):uhaf162. doi: 10.1093/hr/uhaf162. eCollection 2025 Sep.

Antecedent Flu-Like Illness and Onset of Idiopathic Dilated Cardiomyopathy: The DCM Precision Medicine Study.前驱流感样疾病与特发性扩张型心肌病的发病：DCM精准医学研究

Circ Heart Fail. 2025 May;18(5):e012602. doi: 10.1161/CIRCHEARTFAILURE.124.012602. Epub 2025 Apr 14.

Enhancer RNA transcription pinpoints functional genetic variants linked to asthma.增强子RNA转录可精准定位与哮喘相关的功能性基因变异。

Nat Commun. 2025 Mar 31;16(1):2750. doi: 10.1038/s41467-025-57693-x.

Aggregating single nucleotide polymorphisms improves filtering for false-positive associations postimputation.聚合单核苷酸多态性可改善对插补后假阳性关联的过滤。

G3 (Bethesda). 2025 May 8;15(5). doi: 10.1093/g3journal/jkaf043.

A genotype imputation reference panel specific for native Southeast Asian populations.一个专门针对东南亚本土人群的基因型填充参考面板。

NPJ Genom Med. 2024 Oct 5;9(1):47. doi: 10.1038/s41525-024-00435-7.

Genetic variants in canonical Wnt signaling pathway associated with pediatric immune thrombocytopenia.经典 Wnt 信号通路中的遗传变异与儿童免疫性血小板减少症相关。

Blood Adv. 2024 Nov 12;8(21):5529-5538. doi: 10.1182/bloodadvances.2024012776.

Missing genotype imputation in non-model species using self-organizing maps.使用自组织映射对非模式物种进行缺失基因型填充

Mol Ecol Resour. 2025 Apr;25(3):e13992. doi: 10.1111/1755-0998.13992. Epub 2024 Jul 6.

Influence of Polygenic Background on the Clinical Presentation of Familial Hypercholesterolemia.多基因背景对家族性高胆固醇血症临床表现的影响。

Arterioscler Thromb Vasc Biol. 2024 Jul;44(7):1683-1693. doi: 10.1161/ATVBAHA.123.320287. Epub 2024 May 23.

A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software.通过回顾当前方法和软件构建的用于混合人类数据（父母-子女三联体和无关个体）的定相和基因型填充流程

Life (Basel). 2022 Dec 5;12(12):2030. doi: 10.3390/life12122030.

An autoencoder-based deep learning method for genotype imputation.一种基于自动编码器的深度学习基因分型填充方法。

Front Artif Intell. 2022 Nov 3;5:1028978. doi: 10.3389/frai.2022.1028978. eCollection 2022.

本文引用的文献

fcGENE: a versatile tool for processing and transforming SNP datasets.fcGENE：一种用于处理和转换单核苷酸多态性数据集的通用工具。

PLoS One. 2014 Jul 22;9(7):e97589. doi: 10.1371/journal.pone.0097589. eCollection 2014.

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing.通过预分组实现全基因组关联研究中的快速准确基因型推断。

Nat Genet. 2012 Jul 22;44(8):955-9. doi: 10.1038/ng.2354.

1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data.基于 1000 基因组计划的推断为惠康信托基金会病例对照研究第一阶段数据识别出了新的和更精细的关联。

Eur J Hum Genet. 2012 Jul;20(7):801-5. doi: 10.1038/ejhg.2012.3. Epub 2012 Feb 1.

How to deal with the early GWAS data when imputing and combining different arrays is necessary.在需要进行 imputation 和组合不同数组时，如何处理早期 GWAS 数据。

Eur J Hum Genet. 2012 May;20(5):572-6. doi: 10.1038/ejhg.2011.231. Epub 2011 Dec 21.

Association study of a functional genetic variant in KIAA0319 in German dyslexics.KIAA0319基因功能遗传变异与德国诵读困难症患者的关联研究。

Psychiatr Genet. 2012 Aug;22(4):216-7. doi: 10.1097/YPG.0b013e32834c0c97.

Haplotype phasing: existing methods and new developments.单体型相位确定：现有方法和新进展。

Nat Rev Genet. 2011 Sep 16;12(10):703-14. doi: 10.1038/nrg3054.

Imaging genetics of FOXP2 in dyslexia.阅读障碍症中 FOXP2 的影像遗传学研究。

Eur J Hum Genet. 2012 Feb;20(2):224-9. doi: 10.1038/ejhg.2011.160. Epub 2011 Sep 7.

Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows.Arlequin 套件 ver 3.5：一系列在 Linux 和 Windows 下运行的新程序，用于进行群体遗传学分析。

Mol Ecol Resour. 2010 May;10(3):564-7. doi: 10.1111/j.1755-0998.2010.02847.x. Epub 2010 Mar 1.

The effect of genome-wide association scan quality control on imputation outcome for common variants.全基因组关联扫描质量控制对常见变异体的推断结果的影响。

Eur J Hum Genet. 2011 May;19(5):610-4. doi: 10.1038/ejhg.2010.242. Epub 2011 Jan 26.

A comparison of approaches to account for uncertainty in analysis of imputed genotypes.比较分析推断基因型时考虑不确定性的方法。

Genet Epidemiol. 2011 Feb;35(2):102-10. doi: 10.1002/gepi.20552.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于预过滤单核苷酸多态性（SNP）对基因型推断结果的影响

Impact of pre-imputation SNP-filtering on genotype imputation results.

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献