Zhang Yu, Niu Tianhua, Liu Jun S
Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
Am J Hum Genet. 2006 Aug;79(2):313-22. doi: 10.1086/506276. Epub 2006 Jun 28.
Haplotype inference from phase-ambiguous multilocus genotype data is an important task for both disease-gene mapping and studies of human evolution. We report a novel haplotype-inference method based on a coalescence-guided hierarchical Bayes model. In this model, a hierarchical structure is imposed on the prior haplotype frequency distributions to capture the similarities among modern-day haplotypes attributable to their common ancestry. As a consequence, the model both allows distinct haplotypes to have different a priori probabilities according to the inferred hierarchical ancestral structure and results in a proper joint posterior distribution for all the parameters of interest. A Markov chain-Monte Carlo scheme is designed to draw from this posterior distribution. By using coalescence-based simulation and empirically generated data sets (Whitehead Institute's inflammatory bowel disease data sets and HapMap data sets), we demonstrate the merits of the new method in comparison with HAPLOTYPER and PHASE, with or without the presence of recombination hotspots and missing genotypes.
从相位模糊的多位点基因型数据中推断单倍型,对于疾病基因定位和人类进化研究而言都是一项重要任务。我们报告了一种基于合并引导的分层贝叶斯模型的新型单倍型推断方法。在该模型中,先验单倍型频率分布上施加了一种分层结构,以捕捉现代单倍型因其共同祖先而具有的相似性。因此,该模型既允许不同的单倍型根据推断出的分层祖先结构具有不同的先验概率,又能为所有感兴趣的参数产生合适的联合后验分布。设计了一种马尔可夫链蒙特卡罗方案来从这个后验分布中抽样。通过使用基于合并的模拟和经验生成的数据集(怀特黑德研究所的炎症性肠病数据集和HapMap数据集),我们展示了新方法相对于HAPLOTYPER和PHASE的优点,无论是否存在重组热点和缺失基因型。