Zhu Xiaofeng, Zhang Shuanglin, Tang Hua, Cooper Richard
Department of Preventive Medicine and Epidemiology, Loyola University Medical Center, 2160 S. First Ave, Maywood, IL 60153, USA.
Hum Genet. 2006 Oct;120(3):431-45. doi: 10.1007/s00439-006-0224-z. Epub 2006 Aug 5.
Several disease-mapping methods have been proposed recently, which use the information generated by recent admixture of populations from historically distinct geographic origins. These methods include both classic likelihood and Bayesian approaches. In this study we directly maximize the likelihood function from the hidden Markov Model for admixture mapping using the EM algorithm, allowing for uncertainty in model parameters, such as the allele frequencies in the parental populations. We determined the robustness of the proposed method by examining the ancestral allele frequency estimate and individual marker-location specific ancestry when the data were generated by different population admixture models and no learning sample was used. The proposed method outperforms a widely used Bayesian MCMC strategy for data generated from various population admixture models. The multipoint information content for ancestry was derived based on the map provided by Smith et al. (2004) and the associated statistical power was calculated. We examined the distribution of admixture LD across the genome for both real and simulated data and established a threshold for genome wide significance applicable to admixture mapping studies. The software ADMIXPROGRAM for performing admixture mapping is available from authors.
最近提出了几种疾病定位方法,这些方法利用了来自历史上不同地理起源的人群近期混合所产生的信息。这些方法包括经典似然法和贝叶斯方法。在本研究中,我们使用期望最大化(EM)算法直接最大化用于混合定位的隐马尔可夫模型的似然函数,同时考虑模型参数的不确定性,例如亲代群体中的等位基因频率。当数据由不同的群体混合模型生成且未使用学习样本时,我们通过检查祖先等位基因频率估计和个体标记位置特异性祖先来确定所提出方法的稳健性。对于由各种群体混合模型生成的数据,所提出的方法优于广泛使用的贝叶斯马尔可夫链蒙特卡罗(MCMC)策略。基于Smith等人(2004年)提供的图谱推导了祖先多点信息含量,并计算了相关的统计功效。我们检查了真实数据和模拟数据在全基因组范围内混合连锁不平衡(LD)的分布,并建立了适用于混合定位研究的全基因组显著性阈值。执行混合定位的软件ADMIXPROGRAM可从作者处获得。