基于插补法的新一代罕见外显子变异阵列评估

Imputation-based assessment of next generation rare exome variant arrays.

作者信息

Martin Alicia R, Tse Gerard, Bustamante Carlos D, Kenny Eimear E

机构信息

Department of Genetics & Biomedical Informatics Training Program, Stanford University, Stanford, CA, 94305, USA.

出版信息

Pac Symp Biocomput. 2014:241-52.

PMID:24297551

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3900244/

Abstract

A striking finding from recent large-scale sequencing efforts is that the vast majority of variants in the human genome are rare and found within single populations or lineages. These observations hold important implications for the design of the next round of disease variant discovery efforts-if genetic variants that influence disease risk follow the same trend, then we expect to see population-specific disease associations that require large sample sizes for detection. To address this challenge, and due to the still prohibitive cost of sequencing large cohorts, researchers have developed a new generation of low-cost genotyping arrays that assay rare variation previously identified from large exome sequencing studies. Genotyping approaches rely not only on directly observing variants, but also on phasing and imputation methods that use publicly available reference panels to infer unobserved variants in a study cohort. Rare variant exome arrays are intentionally enriched for variants likely to be disease causing, and here we assay the ability of the first commercially available rare exome variant array (the Illumina Infinium HumanExome BeadChip) to also tag other potentially damaging variants not molecularly assayed. Using full sequence data from chromosome 22 from the phase I 1000 Genomes Project, we evaluate three methods for imputation (BEAGLE, MaCH-Admix, and SHAPEIT2/IMPUTE2) with the rare exome variant array under varied study panel sizes, reference panel sizes, and LD structures via population differences. We find that imputation is more accurate across both the genome and exome for common variant arrays than the next generation array for all allele frequencies, including rare alleles. We also find that imputation is the least accurate in African populations, and accuracy is substantially improved for rare variants when the same population is included in the reference panel. Depending on the goals of GWAS researchers, our results will aid budget decisions by helping determine whether money is best spent sequencing the genomes of smaller sample sizes, genotyping larger sample sizes with rare and/or common variant arrays and imputing SNPs, or some combination of the two.

摘要

近期大规模测序研究的一个显著发现是，人类基因组中的绝大多数变异都是罕见的，且仅存在于单个群体或谱系中。这些观察结果对新一轮疾病变异发现工作的设计具有重要意义——如果影响疾病风险的基因变异遵循相同趋势，那么我们预计会看到特定人群的疾病关联，而检测这些关联需要大样本量。为应对这一挑战，且由于对大型队列进行测序的成本仍然过高，研究人员开发了新一代低成本基因分型芯片，用于检测先前在大型外显子组测序研究中鉴定出的罕见变异。基因分型方法不仅依赖于直接观察变异，还依赖于定相和归因方法，这些方法利用公开可用的参考面板来推断研究队列中未观察到的变异。罕见变异外显子组芯片特意富集了可能导致疾病的变异，在此我们检测了首款商用罕见外显子组变异芯片（Illumina Infinium HumanExome BeadChip）标记其他未进行分子检测的潜在有害变异的能力。利用来自千人基因组计划一期22号染色体的全序列数据，我们通过群体差异，在不同的研究样本量、参考样本量和连锁不平衡结构下，评估了三种归因方法（BEAGLE、MaCH - Admix和SHAPEIT2/IMPUTE2）用于罕见外显子组变异芯片的情况。我们发现，对于所有等位基因频率，包括罕见等位基因，常见变异芯片在全基因组和外显子组上的归因都比新一代芯片更准确。我们还发现，归因在非洲人群中最不准确，而当参考面板中包含相同人群时，罕见变异的归因准确性会显著提高。根据全基因组关联研究（GWAS）研究人员的目标，我们的结果将有助于做出预算决策，帮助确定资金是最好用于对较小样本量的基因组进行测序，还是用罕见和/或常见变异芯片对较大样本量进行基因分型并归因单核苷酸多态性（SNP），或者是两者的某种组合。

相似文献

Imputation-based assessment of next generation rare exome variant arrays.

Pac Symp Biocomput. 2014:241-52.

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data.

BMC Genomics. 2015 Dec 29;16:1109. doi: 10.1186/s12864-015-2192-y.

Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs.

Eur J Hum Genet. 2015 Jul;23(7):975-83. doi: 10.1038/ejhg.2014.216. Epub 2014 Oct 8.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

Effect of genome-wide genotyping and reference panels on rare variants imputation.

J Genet Genomics. 2012 Oct 20;39(10):545-50. doi: 10.1016/j.jgg.2012.07.002. Epub 2012 Jul 24.

Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools.

Front Genet. 2019 Apr 3;10:239. doi: 10.3389/fgene.2019.00239. eCollection 2019.

The power of TOPMed imputation for the discovery of Latino-enriched rare variants associated with type 2 diabetes.

Diabetologia. 2023 Jul;66(7):1273-1288. doi: 10.1007/s00125-023-05912-9. Epub 2023 May 6.

Comprehensive Assessment of Genotype Imputation Performance.

Hum Hered. 2018;83(3):107-116. doi: 10.1159/000489758. Epub 2019 Jan 22.

Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations.

PLoS Genet. 2019 Dec 23;15(12):e1008500. doi: 10.1371/journal.pgen.1008500. eCollection 2019 Dec.

iCall: a genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array.

Bioinformatics. 2014 Jun 15;30(12):1714-20. doi: 10.1093/bioinformatics/btu107. Epub 2014 Feb 23.

引用本文的文献

Direct-to-Consumer Genetic Testing for Cardiovascular Disease: A Scientific Statement From the American Heart Association.

Circulation. 2025 Apr 8;151(14):e905-e917. doi: 10.1161/CIR.0000000000001304. Epub 2025 Mar 13.

The critical needs and challenges for genetic architecture studies in Africa.

Curr Opin Genet Dev. 2018 Dec;53:113-120. doi: 10.1016/j.gde.2018.08.005. Epub 2018 Sep 18.

Association analysis of exome variants and refraction, axial length, and corneal curvature in a European-American population.

Hum Mutat. 2018 Dec;39(12):1973-1979. doi: 10.1002/humu.23628. Epub 2018 Sep 11.

Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies.

G3 (Bethesda). 2018 Oct 3;8(10):3255-3267. doi: 10.1534/g3.118.200502.

CDKN2A Copy Number Loss Is an Independent Prognostic Factor in HPV-Negative Head and Neck Squamous Cell Carcinoma.

Front Oncol. 2018 Apr 4;8:95. doi: 10.3389/fonc.2018.00095. eCollection 2018.

Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population.

Genet Sel Evol. 2016 Sep 23;48(1):71. doi: 10.1186/s12711-016-0244-7.

Exome genotyping arrays to identify rare and low frequency variants associated with epithelial ovarian cancer risk.

Hum Mol Genet. 2016 Aug 15;25(16):3600-3612. doi: 10.1093/hmg/ddw196. Epub 2016 Jul 4.

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data.

BMC Genomics. 2015 Dec 29;16:1109. doi: 10.1186/s12864-015-2192-y.

本文引用的文献

Imputation-based genomic coverage assessments of current human genotyping arrays.

G3 (Bethesda). 2013 Oct 3;3(10):1795-807. doi: 10.1534/g3.113.007161.

Bringing genome-wide association findings into clinical use.

Nat Rev Genet. 2013 Aug;14(8):549-58. doi: 10.1038/nrg3523. Epub 2013 Jul 9.

The geography of recent genetic ancestry across Europe.

PLoS Biol. 2013;11(5):e1001555. doi: 10.1371/journal.pbio.1001555. Epub 2013 May 7.

Improved whole-chromosome phasing for disease and population genetic studies.

Nat Methods. 2013 Jan;10(1):5-6. doi: 10.1038/nmeth.2307.

Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity.

Eur J Hum Genet. 2013 Jun;21(6):659-65. doi: 10.1038/ejhg.2012.229. Epub 2012 Dec 19.

An integrated map of genetic variation from 1,092 human genomes.

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

MaCH-admix: genotype imputation for admixed populations.

Genet Epidemiol. 2013 Jan;37(1):25-37. doi: 10.1002/gepi.21690. Epub 2012 Oct 16.

Phasing of many thousands of genotyped samples.

Am J Hum Genet. 2012 Aug 10;91(2):238-51. doi: 10.1016/j.ajhg.2012.06.013.

Genotype imputation with thousands of genomes.

G3 (Bethesda). 2011 Nov;1(6):457-70. doi: 10.1534/g3.111.001198. Epub 2011 Nov 1.

A linear complexity phasing method for thousands of genomes.

Nat Methods. 2011 Dec 4;9(2):179-81. doi: 10.1038/nmeth.1785.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于插补法的新一代罕见外显子变异阵列评估

Imputation-based assessment of next generation rare exome variant arrays.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献