Department of Human Genetics, University of Chicago, Chicago, Illinois, USA.
Nat Genet. 2012 Jul 22;44(8):955-9. doi: 10.1038/ng.2354.
The 1000 Genomes Project and disease-specific sequencing efforts are producing large collections of haplotypes that can be used as reference panels for genotype imputation in genome-wide association studies (GWAS). However, imputing from large reference panels with existing methods imposes a high computational burden. We introduce a strategy called 'pre-phasing' that maintains the accuracy of leading methods while reducing computational costs. We first statistically estimate the haplotypes for each individual within the GWAS sample (pre-phasing) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because (i) the GWAS samples must be phased only once, whereas standard methods would implicitly repeat phasing with each reference panel update, and (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match two unphased GWAS genotypes to a pair of reference haplotypes. We implemented our approach in the MaCH and IMPUTE2 frameworks, and we tested it on data sets from the Wellcome Trust Case Control Consortium 2 (WTCCC2), the Genetic Association Information Network (GAIN), the Women's Health Initiative (WHI) and the 1000 Genomes Project. This strategy will be particularly valuable for repeated imputation as reference panels evolve.
1000 基因组计划和特定疾病的测序工作正在产生大量的单倍型,可以作为全基因组关联研究 (GWAS) 中基因型推断的参考面板。然而,使用现有的方法从大型参考面板中推断会带来很高的计算负担。我们引入了一种称为“预相位”的策略,该策略在保持领先方法准确性的同时降低了计算成本。我们首先在 GWAS 样本中对每个个体的单倍型进行统计估计(预相位),然后将缺失的基因型推断到这些估计的单倍型中。这降低了计算成本,因为 (i) 仅需对 GWAS 样本进行一次相位,而标准方法会在每次参考面板更新时隐式重复相位,以及 (ii) 将已相位的 GWAS 单倍型与一个参考单倍型匹配比将两个未相位的 GWAS 基因型与一对参考单倍型匹配要快得多。我们在 MaCH 和 IMPUTE2 框架中实现了我们的方法,并在来自英国惠康信托基金会病例对照研究 2(WTCCC2)、遗传关联信息网络(GAIN)、妇女健康倡议(WHI)和 1000 基因组计划的数据集中对其进行了测试。随着参考面板的不断发展,这种策略对于重复推断将特别有价值。
Bioinformatics. 2022-11-15
PLoS Genet. 2020-11
Curr Protoc Hum Genet. 2013-7
Am J Hum Genet. 2018-8-9
Pac Symp Biocomput. 2014
Curr Protoc Hum Genet. 2019-6
Plant Genome. 2025-9
BMC Genomics. 2025-8-22
Transl Psychiatry. 2025-8-19
Nat Methods. 2011-12-4
Genet Epidemiol. 2010-12
Nature. 2010-10-28
Nat Rev Genet. 2010-7
Annu Rev Genomics Hum Genet. 2009