Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Am J Hum Genet. 2013 Apr 4;92(4):504-16. doi: 10.1016/j.ajhg.2013.02.011.
Recent emergence of the common-disease-rare-variant hypothesis has renewed interest in the use of large pedigrees for identifying rare causal variants. Genotyping with modern sequencing platforms is increasingly common in the search for such variants but remains expensive and often is limited to only a few subjects per pedigree. In population-based samples, genotype imputation is widely used so that additional genotyping is not needed. We now introduce an analogous approach that enables computationally efficient imputation in large pedigrees. Our approach samples inheritance vectors (IVs) from a Markov Chain Monte Carlo sampler by conditioning on genotypes from a sparse set of framework markers. Missing genotypes are probabilistically inferred from these IVs along with observed dense genotypes that are available on a subset of subjects. We implemented our approach in the Genotype Imputation Given Inheritance (GIGI) program and evaluated the approach on both simulated and real large pedigrees. With a real pedigree, we also compared imputed results obtained from this approach with those from the population-based imputation program BEAGLE. We demonstrated that our pedigree-based approach imputes many alleles with high accuracy. It is much more accurate for calling rare alleles than is population-based imputation and does not require an outside reference sample. We also evaluated the effect of varying other parameters, including the marker type and density of the framework panel, threshold for calling genotypes, and population allele frequencies. By leveraging information from existing genotypes already assayed on large pedigrees, our approach can facilitate cost-effective use of sequence data in the pursuit of rare causal variants.
最近,常见疾病罕见变异假说的出现重新引起了人们对利用大型家系来识别罕见因果变异的兴趣。使用现代测序平台进行基因分型在寻找此类变异体时越来越普遍,但仍然昂贵,并且通常仅限于每个家系中的少数几个个体。在基于人群的样本中,基因型推断被广泛使用,因此不需要额外的基因分型。我们现在引入了一种类似的方法,使大规模家系的计算效率推断成为可能。我们的方法通过对稀疏框架标记的基因型进行条件处理,从马尔可夫链蒙特卡罗采样器中采样遗传向量 (IV)。从这些 IV 以及在部分个体上可获得的观察到的密集基因型中概率推断缺失的基因型。我们在基因型推断给定遗传 (GIGI) 程序中实现了我们的方法,并在模拟和真实大型家系上评估了该方法。在一个真实的家系中,我们还将从这种方法获得的推断结果与基于人群的推断程序 BEAGLE 的推断结果进行了比较。我们证明了我们的基于家系的方法可以非常准确地推断出许多等位基因。与基于人群的推断相比,它对稀有等位基因的调用要准确得多,并且不需要外部参考样本。我们还评估了其他参数的变化的影响,包括标记类型和框架面板的密度、基因型调用的阈值以及群体等位基因频率。通过利用已经在大型家系上进行了检测的现有基因型的信息,我们的方法可以促进在寻找罕见因果变异体时对序列数据的经济高效利用。