Liu J S, Sabatti C, Teng J, Keats B J, Risch N
Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA.
Genome Res. 2001 Oct;11(10):1716-24. doi: 10.1101/gr.194801.
Haplotype analysis of disease chromosomes can help identify probable historical recombination events and localize disease mutations. Most available analyses use only marginal and pairwise allele frequency information. We have developed a Bayesian framework that utilizes full haplotype information to overcome various complications such as multiple founders, unphased chromosomes, data contamination, and incomplete marker data. A stochastic model is used to describe the dependence structure among several variables characterizing the observed haplotypes, for example, the ancestral haplotypes and their ages, mutation rate, recombination events, and the location of the disease mutation. An efficient Markov chain Monte Carlo algorithm was developed for computing the estimates of the quantities of interest. The method is shown to perform well in both real data sets (cystic fibrosis data and Friedreich ataxia data) and simulated data sets. The program that implements the proposed method, BLADE, as well as the two real datasets, can be obtained from http://www.fas.harvard.edu/~junliu/TechRept/01folder/diseq_prog.tar.gz.
疾病染色体的单倍型分析有助于识别可能的历史重组事件并定位疾病突变。大多数现有的分析仅使用边际和成对的等位基因频率信息。我们开发了一种贝叶斯框架,该框架利用完整的单倍型信息来克服各种复杂情况,如多个创始单倍型、未分型染色体、数据污染和不完整的标记数据。使用一个随机模型来描述表征观察到的单倍型的几个变量之间的依赖结构,例如祖先单倍型及其年龄、突变率、重组事件和疾病突变的位置。开发了一种高效的马尔可夫链蒙特卡罗算法来计算感兴趣量的估计值。该方法在真实数据集(囊性纤维化数据和弗里德赖希共济失调数据)和模拟数据集中均表现良好。实现所提出方法的程序BLADE以及两个真实数据集可从http://www.fas.harvard.edu/~junliu/TechRept/01folder/diseq_prog.tar.gz获得。