基于最大系统发育多样性的基因型推断参考面板选择。

Genotype imputation reference panel selection using maximal phylogenetic diversity.

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109.

出版信息

Genetics. 2013 Oct;195(2):319-30. doi: 10.1534/genetics.113.154591. Epub 2013 Aug 9.

DOI:10.1534/genetics.113.154591

PMID:23934887

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3781962/

Abstract

The recent dramatic cost reduction of next-generation sequencing technology enables investigators to assess most variants in the human genome to identify risk variants for complex diseases. However, sequencing large samples remains very expensive. For a study sample with existing genotype data, such as array data from genome-wide association studies, a cost-effective approach is to sequence a subset of the study sample and then to impute the rest of the study sample, using the sequenced subset as a reference panel. The use of such an internal reference panel identifies population-specific variants and avoids the problem of a substantial mismatch in ancestry background between the study population and the reference population. To efficiently select an internal panel, we introduce an idea of phylogenetic diversity from mathematical phylogenetics and comparative genomics. We propose the "most diverse reference panel", defined as the subset with the maximal "phylogenetic diversity", thereby incorporating individuals that span a diverse range of genotypes within the sample. Using data both from simulations and from the 1000 Genomes Project, we show that the most diverse reference panel can substantially improve the imputation accuracy compared to randomly selected reference panels, especially for the imputation of rare variants. The improvement in imputation accuracy holds across different marker densities, reference panel sizes, and lengths for the imputed segments. We thus propose a novel strategy for planning sequencing studies on samples with existing genotype data.

摘要

近年来，下一代测序技术成本的大幅降低使研究人员能够评估人类基因组中的大多数变体，以鉴定复杂疾病的风险变体。然而，对大样本进行测序仍然非常昂贵。对于具有现有基因型数据的研究样本，例如全基因组关联研究的阵列数据，可以采用一种经济有效的方法，对研究样本的一部分进行测序，然后使用测序子集作为参考面板对其余样本进行推断。使用这种内部参考面板可以识别出特定于人群的变体，并避免研究人群和参考人群在祖先背景方面存在显著不匹配的问题。为了有效地选择内部面板，我们从数学系统发生学和比较基因组学中引入了系统发生多样性的概念。我们提出了“最多样化的参考面板”，定义为具有最大“系统发生多样性”的子集，从而纳入了样本中基因型多样化的个体。使用来自模拟和 1000 基因组计划的数据，我们表明与随机选择的参考面板相比，最多样化的参考面板可以显著提高推断准确性，特别是对于稀有变体的推断。这种推断准确性的提高在不同的标记密度、参考面板大小和推断片段长度下都成立。因此，我们提出了一种针对具有现有基因型数据的样本进行测序研究的新策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af30/3781962/89ae5c895baf/319fig1.jpg

相似文献

Genotype imputation reference panel selection using maximal phylogenetic diversity.基于最大系统发育多样性的基因型推断参考面板选择。

Genetics. 2013 Oct;195(2):319-30. doi: 10.1534/genetics.113.154591. Epub 2013 Aug 9.

Choosing Subsamples for Sequencing Studies by Minimizing the Average Distance to the Closest Leaf.通过最小化到最近叶子的平均距离来选择测序研究的子样本。

Genetics. 2015 Oct;201(2):499-511. doi: 10.1534/genetics.115.176909. Epub 2015 Aug 24.

Comparison of genotype imputation strategies using a combined reference panel for chicken population.利用鸡群体的组合参考面板比较基因型推断策略。

Animal. 2019 Jun;13(6):1119-1126. doi: 10.1017/S1751731118002860. Epub 2018 Oct 29.

Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.利用数千个特定研究的全基因组序列进行罕见变异基因型填充：对具有成本效益的研究设计的影响。

Eur J Hum Genet. 2015 Jul;23(7):975-83. doi: 10.1038/ejhg.2014.216. Epub 2014 Oct 8.

Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.基于千人基因组计划的低频和罕见变异基因型填充性能

PLoS One. 2015 Jan 26;10(1):e0116487. doi: 10.1371/journal.pone.0116487. eCollection 2015.

Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.纳入来自印度的特定人群参考面板可提高 1000 基因组计划第 3 阶段面板的推断准确性。

Sci Rep. 2017 Jul 27;7(1):6733. doi: 10.1038/s41598-017-06905-6.

Imputation-based assessment of next generation rare exome variant arrays.基于插补法的新一代罕见外显子变异阵列评估

Pac Symp Biocomput. 2014:241-52.

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data.一种通过结合单核苷酸多态性（SNP）和外显子芯片数据来提高下一代测序数据中罕见变异插补质量的新策略。

BMC Genomics. 2015 Dec 29;16:1109. doi: 10.1186/s12864-015-2192-y.

Sequencing and imputation in GWAS: Cost-effective strategies to increase power and genomic coverage across diverse populations.GWAS 中的测序和插补：在不同人群中提高效能和基因组覆盖范围的经济有效的策略。

Genet Epidemiol. 2020 Sep;44(6):537-549. doi: 10.1002/gepi.22326. Epub 2020 Jun 9.

Evaluation of the imputation performance of the program IMPUTE in an admixed sample from Mexico City using several model designs.评价 IMPUTE 程序在使用多种模型设计的墨西哥城混合样本中的插补性能。

BMC Med Genomics. 2012 May 1;5:12. doi: 10.1186/1755-8794-5-12.

引用本文的文献

Optimizing strain selection for association studies under hard cost constraints.在严格成本限制下优化关联研究的菌株选择

bioRxiv. 2025 Jun 3:2025.05.31.657208. doi: 10.1101/2025.05.31.657208.

How local reference panels improve imputation in French populations.如何利用本地参考面板提高法国人群中的基因数据填补质量。

Sci Rep. 2024 Jan 3;14(1):370. doi: 10.1038/s41598-023-49931-3.

Transethnic analysis of psoriasis susceptibility in South Asians and Europeans enhances fine-mapping in the MHC and genomewide.南亚人和欧洲人银屑病易感性的跨种族分析增强了主要组织相容性复合体（MHC）及全基因组的精细定位。

HGG Adv. 2022 Jan 13;3(1). doi: 10.1016/j.xhgg.2021.100069. Epub 2021 Nov 6.

RefRGim: an intelligent reference panel reconstruction method for genotype imputation with convolutional neural networks.RefRGim：一种基于卷积神经网络的基因型推断智能参考面板重建方法。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab326.

Accurate Imputation of Untyped Variants from Deep Sequencing Data.从深度测序数据中准确推断未分型变异。

Methods Mol Biol. 2021;2243:271-281. doi: 10.1007/978-1-0716-1103-6_13.

Accuracy of Imputation of Microsatellite Markers from a 50K SNP Chip in Spanish Assaf Sheep.利用50K SNP芯片对西班牙阿萨夫绵羊微卫星标记进行填充的准确性

Animals (Basel). 2021 Jan 5;11(1):86. doi: 10.3390/ani11010086.

Genet Epidemiol. 2020 Sep;44(6):537-549. doi: 10.1002/gepi.22326. Epub 2020 Jun 9.

Adapting Genotyping-by-Sequencing and Variant Calling for Heterogeneous Stock Rats.为异质种群大鼠调整测序基因分型和变异检测方法。

G3 (Bethesda). 2020 Jul 7;10(7):2195-2205. doi: 10.1534/g3.120.401325.

Improving Imputation Quality in BEAGLE for Crop and Livestock Data.提高 BEAGLE 在作物和牲畜数据中的插补质量。

G3 (Bethesda). 2020 Jan 7;10(1):177-188. doi: 10.1534/g3.119.400798.

African genetic diversity provides novel insights into evolutionary history and local adaptations.非洲的遗传多样性为进化历史和局部适应提供了新的见解。

Hum Mol Genet. 2018 Aug 1;27(R2):R209-R218. doi: 10.1093/hmg/ddy161.

本文引用的文献

Methods of tagSNP selection and other variables affecting imputation accuracy in swine.猪中标签 SNP 选择方法和其他影响 imputation 准确性的变量。

BMC Genet. 2013 Feb 21;14:8. doi: 10.1186/1471-2156-14-8.

A sample selection strategy for next-generation sequencing.下一代测序的样本选择策略。

Genet Epidemiol. 2012 Nov;36(7):696-709. doi: 10.1002/gepi.21664. Epub 2012 Aug 3.

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing.通过预分组实现全基因组关联研究中的快速准确基因型推断。

Nat Genet. 2012 Jul 22;44(8):955-9. doi: 10.1038/ng.2354.

A coalescent model for genotype imputation.基于合并模型的基因型推断。

Genetics. 2012 Aug;191(4):1239-55. doi: 10.1534/genetics.111.137984. Epub 2012 May 17.

Imputation of single-nucleotide polymorphisms in inbred mice using local phylogeny.使用局部系统发育对近交系小鼠中的单核苷酸多态性进行推断。

Genetics. 2012 Feb;190(2):449-58. doi: 10.1534/genetics.111.132381.

Retention of agronomically important variation in germplasm core collections: implications for allele mining.种质核心收集中农艺重要变异的保持：对等位基因挖掘的影响。

Theor Appl Genet. 2012 Apr;124(6):1155-71. doi: 10.1007/s00122-011-1776-4. Epub 2012 Jan 7.

Haplotype variation and genotype imputation in African populations.非洲人群中的单体型变异和基因型推断。

Genet Epidemiol. 2011 Dec;35(8):766-80. doi: 10.1002/gepi.20626.

Imputation of low-frequency variants using the HapMap3 benefits from large, diverse reference sets.利用 HapMap3 对低频变异进行推断得益于大型多样的参考集。

Eur J Hum Genet. 2011 Jun;19(6):662-6. doi: 10.1038/ejhg.2011.10. Epub 2011 Mar 2.

Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes.扩展罕见变异测试策略：非编码序列和推断基因型分析。

Am J Hum Genet. 2010 Nov 12;87(5):604-17. doi: 10.1016/j.ajhg.2010.10.012.

MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes.MaCH：利用序列和基因型数据来估计单倍型和未观测基因型。

Genet Epidemiol. 2010 Dec;34(8):816-34. doi: 10.1002/gepi.20533.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于最大系统发育多样性的基因型推断参考面板选择。

Genotype imputation reference panel selection using maximal phylogenetic diversity.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献