Morris Andrew, Pedder Alan, Ayres Karen
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.
Genet Epidemiol. 2003 Sep;25(2):106-14. doi: 10.1002/gepi.10254.
Analyses of high-density single-nucleotide polymorphism (SNP) data, such as genetic mapping and linkage disequilibrium (LD) studies, require phase-known haplotypes to allow for the correlation between tightly linked loci. However, current SNP genotyping technology cannot determine phase, which must be inferred statistically. In this paper, we present a new Bayesian Markov chain Monte Carlo (MCMC) algorithm for population haplotype frequency estimation, particularly in the context of LD assessment. The novel feature of the method is the incorporation of a log-linear prior model for population haplotype frequencies. We present simulations to suggest that 1) the log-linear prior model is more appropriate than the standard coalescent process in the presence of recombination (>0.02 cM between adjacent loci), and 2) there is substantial inflation in measures of LD obtained by a "two-stage" approach to the analysis by treating the "best" haplotype configuration as correct, without regard to uncertainty in the recombination process.
对高密度单核苷酸多态性(SNP)数据的分析,如基因定位和连锁不平衡(LD)研究,需要已知相位的单倍型,以便紧密连锁位点之间能够建立关联。然而,当前的SNP基因分型技术无法确定相位,必须通过统计推断来确定。在本文中,我们提出了一种新的贝叶斯马尔可夫链蒙特卡罗(MCMC)算法,用于估计群体单倍型频率,特别是在LD评估的背景下。该方法的新颖之处在于纳入了群体单倍型频率的对数线性先验模型。我们通过模拟表明:1)在存在重组(相邻位点之间大于0.02 cM)的情况下,对数线性先验模型比标准的合并过程更合适;2)通过“两阶段”方法进行分析时,将“最佳”单倍型构型视为正确,而不考虑重组过程中的不确定性,会导致LD测量值出现显著膨胀。