使用数千份基因组进行基因型推断。

Genotype imputation with thousands of genomes.

出版信息

G3 (Bethesda). 2011 Nov;1(6):457-70. doi: 10.1534/g3.111.001198. Epub 2011 Nov 1.

DOI:10.1534/g3.111.001198

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3276165/

Abstract

Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package.

摘要

基因分型是一种统计技术，常用于提高遗传关联研究的功效和分辨率。该方法通过在参考面板中使用单倍型模式来预测研究数据集中未观察到的基因型，并且已经提出了许多方法来选择参考单倍型的子集，以在给定的研究人群中最大化准确性。随着测序工作（如 1000 基因组计划）产生更大和更多样化的参考集，这些面板选择策略变得更难应用和解释，这促使我们开发了一种替代框架。我们的方法围绕着一个新的近似值构建，该近似值使用局部序列相似性为基因组的每个区域中的每个研究单倍型选择自定义参考面板。这种近似使得使用所有可用的参考单倍型在计算上变得高效，这使我们能够绕过面板选择步骤，并通过捕获人群之间意外的等位基因共享来提高低频变体的准确性。使用 HapMap 3 中的数据，我们表明我们的框架在广泛的人类群体中产生了准确的结果。我们还使用来自疟疾遗传流行病学网络（MalariaGEN）的数据为非洲的基于 imputation 的研究提供建议。我们证明我们的近似值提高了大型基于序列的参考面板的效率，并讨论了现代参考数据集的一般计算策略。全基因组关联研究很快将能够利用数千个参考基因组的功能，我们的工作为研究人员提供了一种实用的方法来利用这种丰富的信息。本研究中的新方法学已在 IMPUTE2 软件包中实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/441c/3276165/a9f79fc546d7/457f1.jpg

相似文献

Genotype imputation with thousands of genomes.使用数千份基因组进行基因型推断。

G3 (Bethesda). 2011 Nov;1(6):457-70. doi: 10.1534/g3.111.001198. Epub 2011 Nov 1.

Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.利用来自分布式参考面板的多组推算基因型提高关联检验效能。

Genet Epidemiol. 2017 Dec;41(8):744-755. doi: 10.1002/gepi.22067. Epub 2017 Sep 1.

Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.纳入来自印度的特定人群参考面板可提高 1000 基因组计划第 3 阶段面板的推断准确性。

Sci Rep. 2017 Jul 27;7(1):6733. doi: 10.1038/s41598-017-06905-6.

A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.基于循环神经网络的匿名单倍型参考信息基因型推断方法。

PLoS Comput Biol. 2020 Oct 1;16(10):e1008207. doi: 10.1371/journal.pcbi.1008207. eCollection 2020 Oct.

PRED-LD: efficient imputation of GWAS summary statistics.PRED-LD：全基因组关联研究汇总统计数据的高效估算

BMC Bioinformatics. 2025 Apr 16;26(1):107. doi: 10.1186/s12859-025-06119-y.

Genotype Imputation in Genome-Wide Association Studies.全基因组关联研究中的基因型填充

Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84.

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing.通过预分组实现全基因组关联研究中的快速准确基因型推断。

Nat Genet. 2012 Jul 22;44(8):955-9. doi: 10.1038/ng.2354.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH：利用序列和基因型数据来估计单倍型和未观测基因型。

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels.利用HapMap和千人基因组计划参考面板对拉丁裔进行基因型推算

Front Genet. 2012 Jun 27;3:117. doi: 10.3389/fgene.2012.00117. eCollection 2012.

MaCH-admix: genotype imputation for admixed populations.MaCH-admix：混合人群的基因型推断。

Genet Epidemiol. 2013 Jan;37(1):25-37. doi: 10.1002/gepi.21690. Epub 2012 Oct 16.

引用本文的文献

Deep adversarial learning identifies ADHD-specific associations between apoptotic genes and white matter microstructure in frontal-striatum-cerebellum circuit.深度对抗学习识别出凋亡基因与额-纹状体-小脑回路中白质微结构之间的注意缺陷多动障碍特异性关联。

Transl Psychiatry. 2025 Aug 26;15(1):320. doi: 10.1038/s41398-025-03493-2.

Assessment of a microhaplotype panel for human identification and ancestry inference in Brazil.用于巴西人群身份识别和血统推断的微单倍型面板评估

Int J Legal Med. 2025 Aug 22. doi: 10.1007/s00414-025-03573-4.

Altered branched chain ketoacids underlie shared metabolic phenotypes in type 1 diabetes and maple syrup urine disease.支链酮酸改变是1型糖尿病和枫糖尿症共同代谢表型的基础。

Commun Med (Lond). 2025 Jul 26;5(1):311. doi: 10.1038/s43856-025-01028-w.

Multi-phase, multi-ethnic GWAS uncovers putative loci in predisposition to elite sprint and power performance, health and disease.多阶段、多民族全基因组关联研究揭示了精英短跑和力量表现、健康与疾病易感性中的潜在基因座。

Biol Sport. 2025 Feb 4;42(3):141-159. doi: 10.5114/biolsport.2025.147015. eCollection 2025 Jul.

Donor genetics and storage conditions influence mitochondrial DNA and extracellular vesicle levels in RBC units.供体遗传学和储存条件会影响红细胞单位中的线粒体DNA和细胞外囊泡水平。

JCI Insight. 2025 Jun 10;10(14). doi: 10.1172/jci.insight.187792. eCollection 2025 Jul 22.

Machine learning solutions for integrating partially overlapping genetic datasets and modelling host-endophyte effects in ryegrass () dry matter yield estimation.用于整合部分重叠遗传数据集并模拟黑麦草宿主-内生菌效应以估计干物质产量的机器学习解决方案。

Front Plant Sci. 2025 May 6;16:1543956. doi: 10.3389/fpls.2025.1543956. eCollection 2025.

Genome-wide association studies on malaria in Sub-Saharan Africa: A scoping review.撒哈拉以南非洲地区疟疾的全基因组关联研究：一项范围综述。

PLoS One. 2025 May 16;20(5):e0309268. doi: 10.1371/journal.pone.0309268. eCollection 2025.

Biobanks in GENETICS and G3: tackling the statistical challenges.遗传学领域及《G3：基因与基因组学》中的生物样本库：应对统计学挑战

Genetics. 2025 Apr 17;229(4). doi: 10.1093/genetics/iyaf046.

Biobanks in GENETICS and G3: tackling the statistical challenges.遗传学领域及《G3：基因与基因组学》中的生物样本库：应对统计学挑战

G3 (Bethesda). 2025 Apr 17;15(4). doi: 10.1093/g3journal/jkaf060.

Genome-wide association meta-analyses of drug-resistant epilepsy.耐药性癫痫的全基因组关联荟萃分析。

EBioMedicine. 2025 May;115:105675. doi: 10.1016/j.ebiom.2025.105675. Epub 2025 Apr 15.

本文引用的文献

Low-coverage sequencing: implications for design of complex trait association studies.低覆盖度测序：对复杂性状关联研究设计的影响。

Genome Res. 2011 Jun;21(6):940-51. doi: 10.1101/gr.117259.110. Epub 2011 Apr 1.

Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.利用 HapMap3 对低频变异进行推断得益于大型多样的参考集。

Eur J Hum Genet. 2011 Jun;19(6):662-6. doi: 10.1038/ejhg.2011.10. Epub 2011 Mar 2.

A comparison of approaches to account for uncertainty in analysis of imputed genotypes.比较分析推断基因型时考虑不确定性的方法。

Genet Epidemiol. 2011 Feb;35(2):102-10. doi: 10.1002/gepi.20552.

Progress and promise of genome-wide association studies for human complex trait genetics.全基因组关联研究在人类复杂性状遗传学中的进展和前景。

Genetics. 2011 Feb;187(2):367-83. doi: 10.1534/genetics.110.120907. Epub 2010 Nov 29.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH：利用序列和基因型数据来估计单倍型和未观测基因型。

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

A generic coalescent-based framework for the selection of a reference panel for imputation.基于泛凝聚的参考面板选择方法用于 imputation。

Genet Epidemiol. 2010 Dec;34(8):773-82. doi: 10.1002/gepi.20505.

A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。

Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.

Inference of unexpected genetic relatedness among individuals in HapMap Phase III.推断 HapMap 第三阶段个体之间意外的遗传关联性。

Am J Hum Genet. 2010 Oct 8;87(4):457-64. doi: 10.1016/j.ajhg.2010.08.014.

Integrating common and rare genetic variation in diverse human populations.整合不同人类群体中的常见和罕见遗传变异。

Nature. 2010 Sep 2;467(7311):52-8. doi: 10.1038/nature09298.

Genotype imputation for genome-wide association studies.全基因组关联研究中的基因型推断。

Nat Rev Genet. 2010 Jul;11(7):499-511. doi: 10.1038/nrg2796.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用数千份基因组进行基因型推断。

Genotype imputation with thousands of genomes.

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献