准确性度量的选择何时会改变插补准确性评估？

When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?

作者信息

Ramnarine Shelina, Zhang Juan, Chen Li-Shiun, Culverhouse Robert, Duan Weimin, Hancock Dana B, Hartz Sarah M, Johnson Eric O, Olfson Emily, Schwantes-An Tae-Hwi, Saccone Nancy L

机构信息

Department of Genetics, Washington University, St. Louis, Missouri, United States of America.

Chinese Academy of Sciences, Key Laboratory of Brain Function and Disease, School of Life Sciences, University of Science and Technology of China, Hefei, Anhui, China.

出版信息

PLoS One. 2015 Oct 12;10(10):e0137601. doi: 10.1371/journal.pone.0137601. eCollection 2015.

DOI:10.1371/journal.pone.0137601

PMID:26458263

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4601794/

Abstract

Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen's kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants.

摘要

填补是指推断未分型变异基因型的过程，用于识别和完善基因关联研究结果。填补数据中的不准确信息可能会扭曲观察到的变异与疾病之间的关联。许多统计方法用于评估准确性；有些方法将填补数据与基因分型数据进行比较，而其他方法则在不参考真实基因型的情况下进行计算。先前的研究表明，基于科恩kappa统计量并将填补基因型概率与真实基因型进行比较的填补质量评分（IQS）能够适当调整随机一致性；然而，它并不常用。为了识别准确性评估中的差异，我们将IQS与一致性率、平方相关系数以及填补程序中内置的准确性度量进行了比较。对1000基因组参考人群（非洲裔N = 246，欧洲裔N = 379）的基因型进行掩码处理，以匹配几种单核苷酸多态性（SNP）芯片的分型SNP覆盖范围，并在与吸烟行为相关的区域使用BEAGLE 3.3.2和IMPUTE2进行填补。对来自尼古丁依赖协作基因研究和非裔美国人尼古丁依赖基因研究的测序受试者（1481名非裔美国人和1480名欧洲裔美国人）进行了额外的掩码处理和填补。我们的结果进一步证明，一致性率会夸大准确性估计，尤其是对于罕见和低频变异。对于常见变异，平方相关系数、BEAGLE R2、IMPUTE2 INFO和IQS对填补准确性的评估结果相似。然而，对于罕见和低频变异，与IQS相比，其他统计量在准确性评估上往往更为宽松。在评估填补准确性时，尤其是对于罕见和低频变异，IQS是一个重要的考虑因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e3f6/4601794/783f77196fef/pone.0137601.g001.jpg

相似文献

When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?准确性度量的选择何时会改变插补准确性评估？

PLoS One. 2015 Oct 12;10(10):e0137601. doi: 10.1371/journal.pone.0137601. eCollection 2015.

Assessment of genotype imputation performance using 1000 Genomes in African American studies.使用 1000 基因组计划在非裔美国人研究中评估基因型推断性能。

PLoS One. 2012;7(11):e50610. doi: 10.1371/journal.pone.0050610. Epub 2012 Nov 30.

Comprehensive evaluation of imputation performance in African Americans.对非裔美国人插补性能的综合评估。

J Hum Genet. 2012 Jul;57(7):411-21. doi: 10.1038/jhg.2012.43. Epub 2012 May 31.

A new statistic to evaluate imputation reliability.一种评估插补可靠性的新统计量。

PLoS One. 2010 Mar 15;5(3):e9697. doi: 10.1371/journal.pone.0009697.

Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications.基因组预测背景下基因型填充正确性度量的评估：家畜应用综述

Animal. 2014 Nov;8(11):1743-53. doi: 10.1017/S1751731114001803. Epub 2014 Jul 21.

Imputation of genotypes from low density (50,000 markers) to high density (700,000 markers) of cows from research herds in Europe, North America, and Australasia using 2 reference populations.利用2个参考群体，对欧洲、北美和澳大拉西亚研究牛群中低密度（50,000个标记）到高密度（700,000个标记）的奶牛基因型进行推算。

J Dairy Sci. 2014 Mar;97(3):1799-811. doi: 10.3168/jds.2013-7368. Epub 2014 Jan 25.

Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools.混合人群中的罕见变异插补：参考面板和生物信息学工具的比较

Front Genet. 2019 Apr 3;10:239. doi: 10.3389/fgene.2019.00239. eCollection 2019.

Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。

BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.

Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome.东南亚人群中基因型推断的验证及单核苷酸多态性注释对推断结果的影响。

BMC Med Genet. 2018 Feb 13;19(1):23. doi: 10.1186/s12881-018-0534-8.

Revisit Population-based and Family-based Genotype Imputation.重新审视基于人群和基于家庭的基因型推断。

Sci Rep. 2019 Feb 12;9(1):1800. doi: 10.1038/s41598-018-38469-4.

引用本文的文献

Performance Comparison of Genomic Best Linear Unbiased Prediction and Four Machine Learning Models for Estimating Genomic Breeding Values in Working Dogs.基因组最佳线性无偏预测与四种机器学习模型在工作犬基因组育种值估计中的性能比较

Animals (Basel). 2025 Feb 2;15(3):408. doi: 10.3390/ani15030408.

Autoencoder imputation of missing heterogeneous data for Alzheimer's disease classification.用于阿尔茨海默病分类的缺失异质数据的自动编码器插补

Healthc Technol Lett. 2024 Sep 15;11(6):452-460. doi: 10.1049/htl2.12091. eCollection 2024 Dec.

Population-specific reference panel improves imputation quality for genome-wide association studies conducted on the Japanese population.针对特定人群的参考面板可提高在日本人群中开展的全基因组关联研究的基因填充质量。

Commun Biol. 2024 Dec 19;7(1):1665. doi: 10.1038/s42003-024-07338-4.

Imputation accuracy across global human populations.全球人类群体的插补准确性。

Am J Hum Genet. 2024 May 2;111(5):979-989. doi: 10.1016/j.ajhg.2024.03.011. Epub 2024 Apr 10.

Genotype imputation methods for whole and complex genomic regions utilizing deep learning technology.利用深度学习技术对全基因组和复杂基因组区域进行基因型推断的方法。

J Hum Genet. 2024 Oct;69(10):481-486. doi: 10.1038/s10038-023-01213-6. Epub 2024 Jan 15.

Imputation Accuracy Across Global Human Populations.全球人类群体的插补准确性。

bioRxiv. 2023 Oct 26:2023.05.22.541241. doi: 10.1101/2023.05.22.541241.

Cost-effectively dissecting the genetic architecture of complex wool traits in rabbits by low-coverage sequencing.通过低覆盖度测序，经济有效地剖析家兔复杂羊毛性状的遗传结构。

Genet Sel Evol. 2022 Nov 18;54(1):75. doi: 10.1186/s12711-022-00766-y.

Genome Wide Association Study with Imputed Whole Genome Sequence Data Identifies a 431 kb Risk Haplotype on CFA18 for Congenital Laryngeal Paralysis in Alaskan Sled Dogs.全基因组关联研究与推断的全基因组序列数据确定了阿拉斯加雪橇犬先天性喉麻痹的 CFA18 上的 431 kb 风险单倍型。

Genes (Basel). 2022 Oct 6;13(10):1808. doi: 10.3390/genes13101808.

A comparative analysis of current phasing and imputation software.当前相位分析和插补软件的比较分析。

PLoS One. 2022 Oct 19;17(10):e0260177. doi: 10.1371/journal.pone.0260177. eCollection 2022.

Evaluation of Whole-Genome Sequence Imputation Strategies in Korean Hanwoo Cattle.韩牛全基因组序列填充策略的评估

Animals (Basel). 2022 Sep 1;12(17):2265. doi: 10.3390/ani12172265.

本文引用的文献

Human Nail Clippings as a Source of DNA for Genetic Studies.人类指甲剪屑作为基因研究的DNA来源

Open J Epidemiol. 2015 Feb 1;5(1):41-50. doi: 10.4236/ojepi.2015.51006.

Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.基于千人基因组计划的低频和罕见变异基因型填充性能

PLoS One. 2015 Jan 26;10(1):e0116487. doi: 10.1371/journal.pone.0116487. eCollection 2015.

Evaluating the concordance between sequencing, imputation and microarray genotype calls in the GAW18 data.评估GAW18数据中测序、填充和基因芯片基因型调用之间的一致性。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S22. doi: 10.1186/1753-6561-8-S1-S22. eCollection 2014.

Quality control and conduct of genome-wide association meta-analyses.全基因组关联荟萃分析的质量控制与实施

Nat Protoc. 2014 May;9(5):1192-212. doi: 10.1038/nprot.2014.071. Epub 2014 Apr 24.

Imputation-based genomic coverage assessments of current human genotyping arrays.基于插补法的当前人类基因分型阵列的基因组覆盖度评估

G3 (Bethesda). 2013 Oct 3;3(10):1795-807. doi: 10.1534/g3.113.007161.

Impact of Hardy-Weinberg disequilibrium on post-imputation quality control.哈迪-温伯格不平衡对插补后质量控制的影响。

Hum Genet. 2013 Sep;132(9):1073-5. doi: 10.1007/s00439-013-1336-x. Epub 2013 Jul 11.

Imputation across genotyping arrays for genome-wide association studies: assessment of bias and a correction strategy.全基因组关联研究中基于基因分型阵列的插补：偏差评估和校正策略。

Hum Genet. 2013 May;132(5):509-22. doi: 10.1007/s00439-013-1266-7. Epub 2013 Jan 22.

A comprehensive SNP and indel imputability database.一个全面的 SNP 和 indel 可归因数据库。

Bioinformatics. 2013 Feb 15;29(4):528-31. doi: 10.1093/bioinformatics/bts724. Epub 2013 Jan 3.

Assessment of genotype imputation performance using 1000 Genomes in African American studies.使用 1000 基因组计划在非裔美国人研究中评估基因型推断性能。

PLoS One. 2012;7(11):e50610. doi: 10.1371/journal.pone.0050610. Epub 2012 Nov 30.

An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

准确性度量的选择何时会改变插补准确性评估？

When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments?

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献