比较多种插补方法和其他方法在分析插补基因型中的应用。

Comparison of multiple imputation and other methods for the analysis of imputed genotypes.

机构信息

Division of Biostatistics, Institute for Health & Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA.

Center for Statistical Genetics, Gertrude H. Sergievsky Center, and the Department of Neurology, Columbia University Medical Center, New York, NY, USA.

出版信息

BMC Genomics. 2023 Jun 6;24(1):303. doi: 10.1186/s12864-023-09415-0.

DOI:10.1186/s12864-023-09415-0

PMID:37277705

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10242917/

Abstract

BACKGROUND

Analysis of imputed genotypes is an important and routine component of genome-wide association studies and the increasing size of imputation reference panels has facilitated the ability to impute and test low-frequency variants for associations. In the context of genotype imputation, the true genotype is unknown and genotypes are inferred with uncertainty using statistical models. Here, we present a novel method for integrating imputation uncertainty into statistical association tests using a fully conditional multiple imputation (MI) approach which is implemented using the Substantive Model Compatible Fully Conditional Specification (SMCFCS). We compared the performance of this method to an unconditional MI and two additional approaches that have been shown to demonstrate excellent performance: regression with dosages and a mixture of regression models (MRM).

RESULTS

Our simulations considered a range of allele frequencies and imputation qualities based on data from the UK Biobank. We found that the unconditional MI was computationally costly and overly conservative across a wide range of settings. Analyzing data with Dosage, MRM, or MI SMCFCS resulted in greater power, including for low frequency variants, compared to unconditional MI while effectively controlling type I error rates. MRM andl MI SMCFCS are both more computationally intensive then using Dosage.

CONCLUSIONS

The unconditional MI approach for association testing is overly conservative and we do not recommend its use in the context of imputed genotypes. Given its performance, speed, and ease of implementation, we recommend using Dosage for imputed genotypes with MAF [Formula: see text] 0.001 and Rsq [Formula: see text] 0.3.

摘要

背景

分析推断基因型是全基因组关联研究的一个重要且常规的组成部分，越来越大的基因型推断参考面板增加了推断和测试低频变异关联的能力。在基因型推断的背景下，真实基因型是未知的，并且使用统计模型以不确定的方式推断基因型。在这里，我们提出了一种使用全条件多重推断（MI）方法将推断不确定性整合到统计关联测试中的新方法，该方法使用实质性模型兼容的全条件规范（SMCFCS）实现。我们将这种方法的性能与无条件 MI 以及另外两种已被证明具有出色性能的方法进行了比较：剂量回归和混合回归模型（MRM）。

结果

我们的模拟考虑了基于英国生物库数据的一系列等位基因频率和推断质量。我们发现，无条件 MI 在广泛的设置中计算成本高且过于保守。与无条件 MI 相比，使用剂量、MRM 或 MI SMCFCS 分析数据可提高功效，包括对低频变异的功效，同时有效控制 I 型错误率。MRM 和 MI SMCFCS 都比使用剂量更耗费计算资源。

结论

用于关联测试的无条件 MI 方法过于保守，我们不建议在推断基因型的情况下使用它。考虑到其性能、速度和易于实现，我们建议在 MAF [公式：见正文] 0.001 和 Rsq [公式：见正文] 0.3 的情况下使用剂量进行推断基因型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6aa2/10242917/b2a864bf24cc/12864_2023_9415_Fig1_HTML.jpg

相似文献

Comparison of multiple imputation and other methods for the analysis of imputed genotypes.比较多种插补方法和其他方法在分析插补基因型中的应用。

BMC Genomics. 2023 Jun 6;24(1):303. doi: 10.1186/s12864-023-09415-0.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

Genotype imputation accuracy and the quality metrics of the minor ancestry in multi-ancestry reference panels.多祖源参考面板中小遗传背景的基因型推断准确性和质量指标。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad509.

Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle.评估插补序列变异基因型的准确性及其在牛因果变异检测中的效用。

Genet Sel Evol. 2017 Feb 21;49(1):24. doi: 10.1186/s12711-017-0301-x.

Investigating the accuracy of imputing autosomal variants in Nellore cattle using the ARS-UCD1.2 assembly of the bovine genome.利用牛基因组的ARS-UCD1.2组装版本研究内洛尔牛常染色体变异的估算准确性。

BMC Genomics. 2020 Nov 10;21(1):772. doi: 10.1186/s12864-020-07184-8.

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.利用来自分布式参考面板的多组推算基因型提高关联检验效能。

Genet Epidemiol. 2017 Dec;41(8):744-755. doi: 10.1002/gepi.22067. Epub 2017 Sep 1.

Evaluation and application of summary statistic imputation to discover new height-associated loci.评估和应用汇总统计推断发现新的身高相关位点。

PLoS Genet. 2018 May 21;14(5):e1007371. doi: 10.1371/journal.pgen.1007371. eCollection 2018 May.

Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications.基因组预测背景下基因型填充正确性度量的评估：家畜应用综述

Animal. 2014 Nov;8(11):1743-53. doi: 10.1017/S1751731114001803. Epub 2014 Jul 21.

Mitochondrial sequence variants: testing imputation accuracy and their association with dairy cattle milk traits.线粒体序列变异：检验推断准确性及其与奶牛乳性状的关系。

Genet Sel Evol. 2024 Sep 12;56(1):62. doi: 10.1186/s12711-024-00931-5.

Genotype imputation of Metabochip SNPs using a study-specific reference panel of ~4,000 haplotypes in African Americans from the Women's Health Initiative.使用来自妇女健康倡议的约 4000 个非洲裔美国人的研究特定参考面板对 Metabochip SNPs 进行基因型推断。

Genet Epidemiol. 2012 Feb;36(2):107-17. doi: 10.1002/gepi.21603.

本文引用的文献

Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank.通过对英国生物库的外显子组测序推进人类遗传学研究和药物发现。

Nat Genet. 2021 Jul;53(7):942-948. doi: 10.1038/s41588-021-00885-0. Epub 2021 Jun 28.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.美国国立卫生研究院生物医学高级研究与发展局（NHLBI）TOPMed 项目中对 53831 个不同基因组进行测序。

Nature. 2021 Feb;590(7845):290-299. doi: 10.1038/s41586-021-03205-y. Epub 2021 Feb 10.

Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations.超过 10 万 NHLBI 转化医学精准医学（TOPMed）联盟全基因组序列的使用提高了混合非裔和西班牙裔/拉丁裔人群中罕见变异关联的推断质量和检测能力。

PLoS Genet. 2019 Dec 23;15(12):e1008500. doi: 10.1371/journal.pgen.1008500. eCollection 2019 Dec.

Comparison of Conventional Lipoprotein Tests and Apolipoproteins in the Prediction of Cardiovascular Disease.常规脂蛋白检测与载脂蛋白在心血管疾病预测中的比较。

Circulation. 2019 Aug 13;140(7):542-552. doi: 10.1161/CIRCULATIONAHA.119.041149. Epub 2019 Jun 20.

The UK Biobank resource with deep phenotyping and genomic data.英国生物银行资源库，具有深度表型和基因组数据。

Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.

Genome-wide association test of multiple continuous traits using imputed SNPs.使用推算的单核苷酸多态性对多个连续性性状进行全基因组关联测试。

Stat Interface. 2017;10(3):379-386. doi: 10.4310/SII.2017.v10.n3.a2.

Guidelines for Large-Scale Sequence-Based Complex Trait Association Studies: Lessons Learned from the NHLBI Exome Sequencing Project.基于大规模序列的复杂性状关联研究指南：从美国国立心肺血液研究所外显子测序项目中吸取的经验教训。

Am J Hum Genet. 2016 Oct 6;99(4):791-801. doi: 10.1016/j.ajhg.2016.08.012. Epub 2016 Sep 22.

Next-generation genotype imputation service and methods.下一代基因型填充服务和方法。

Nat Genet. 2016 Oct;48(10):1284-1287. doi: 10.1038/ng.3656. Epub 2016 Aug 29.

A reference panel of 64,976 haplotypes for genotype imputation.用于基因型插补的64976个单倍型参考面板。

Nat Genet. 2016 Oct;48(10):1279-83. doi: 10.1038/ng.3643. Epub 2016 Aug 22.

Bias Characterization in Probabilistic Genotype Data and Improved Signal Detection with Multiple Imputation.概率基因型数据中的偏差特征分析与多重填补改进信号检测

PLoS Genet. 2016 Jun 16;12(6):e1006091. doi: 10.1371/journal.pgen.1006091. eCollection 2016 Jun.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

比较多种插补方法和其他方法在分析插补基因型中的应用。

Comparison of multiple imputation and other methods for the analysis of imputed genotypes.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献