Yang Wen-Yun, Hormozdiari Farhad, Eskin Eleazar, Pasaniuc Bogdan
1 Department of Computer Science, University of California , Los Angeles, California.
J Comput Biol. 2015 May;22(5):451-62. doi: 10.1089/cmb.2014.0151. Epub 2014 Dec 19.
Ever since its introduction, the haplotype copy model has proven to be one of the most successful approaches for modeling genetic variation in human populations, with applications ranging from ancestry inference to genotype phasing and imputation. Motivated by coalescent theory, this approach assumes that any chromosome (haplotype) can be modeled as a mosaic of segments copied from a set of chromosomes sampled from the same population. At the core of the model is the assumption that any chromosome from the sample is equally likely to contribute a priori to the copying process. Motivated by recent works that model genetic variation in a geographic continuum, we propose a new spatial-aware haplotype copy model that jointly models geography and the haplotype copying process. We extend hidden Markov models of haplotype diversity such that at any given location, haplotypes that are closest in the genetic-geographic continuum map are a priori more likely to contribute to the copying process than distant ones. Through simulations starting from the 1000 Genomes data, we show that our model achieves superior accuracy in genotype imputation over the standard spatial-unaware haplotype copy model. In addition, we show the utility of our model in selecting a small personalized reference panel for imputation that leads to both improved accuracy as well as to a lower computational runtime than the standard approach. Finally, we show our proposed model can be used to localize individuals on the genetic-geographical map on the basis of their genotype data.
自引入以来,单倍型复制模型已被证明是对人类群体遗传变异进行建模的最成功方法之一,其应用范围从祖先推断到基因型定相和填充。受合并理论的启发,该方法假设任何染色体(单倍型)都可以建模为由从同一群体中采样的一组染色体复制而来的片段镶嵌体。该模型的核心假设是样本中的任何染色体在复制过程中先验地具有同等的贡献可能性。受最近在地理连续体中对遗传变异进行建模的工作的启发,我们提出了一种新的空间感知单倍型复制模型,该模型联合对地理和单倍型复制过程进行建模。我们扩展了单倍型多样性的隐马尔可夫模型,使得在任何给定位置,在遗传 - 地理连续体图谱中最接近的单倍型比距离远的单倍型在复制过程中先验地更有可能做出贡献。通过从千人基因组数据开始的模拟,我们表明我们的模型在基因型填充方面比标准的空间无感知单倍型复制模型具有更高的准确性。此外,我们展示了我们的模型在选择用于填充的小型个性化参考面板方面的效用,这导致与标准方法相比,准确性提高且计算运行时间更短。最后,我们表明我们提出的模型可用于根据个体的基因型数据在遗传 - 地理图谱上对其进行定位。