McKeigue P M, Carpenter J R, Parra E J, Shriver M D
Department of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, UK.
Ann Hum Genet. 2000 Mar;64(Pt 2):171-86. doi: 10.1017/S0003480000008022.
We describe a novel method for analysis of marker genotype data from admixed populations, based on a hybrid of Bayesian and frequentist approaches in which the posterior distribution is generated by Markov chain simulation and score tests are obtained from the missing-data likelihood. We analysed data on unrelated individuals from eight African-American populations, genotyped at ten marker loci of which two (FY and AT3) are linked (22 cM apart). Linkage between these two loci was detected by testing for association of ancestry conditional on parental admixture. The strength of this association was consistent with European gene flow into the African-American population between five and nine generations ago. To mimic the mapping of an unknown gene in an 'affecteds- only' analysis, a binary trait was constructed from the genotype at the AT3 locus and a score test was shown to detect linkage of this 'trait' with the FY locus. Mis-specification of the ancestry-specific allele frequencies - the probabilities of each allelic state given the ancestry of the allele - was detected at three of the ten marker loci. The methods described here have wide application to the analysis of data from admixed populations, allowing the effects of linkage and population structure (variation of admixture between individuals) to be distinguished. With more markers and a more complex statistical model, genes underlying ethnic differences in disease risk could be mapped by this approach.
我们描述了一种用于分析混合群体中标记基因型数据的新方法,该方法基于贝叶斯方法和频率论方法的混合,其中后验分布通过马尔可夫链模拟生成,得分检验从缺失数据似然性中获得。我们分析了来自八个非裔美国人种群的无关个体的数据,这些个体在十个标记位点进行了基因分型,其中两个位点(FY和AT3)是连锁的(相距22厘摩)。通过检验基于亲本混合的祖先关联性,检测到了这两个位点之间的连锁。这种关联的强度与五到九代以前欧洲基因流入非裔美国人种群的情况一致。为了模拟“仅针对患病个体”分析中未知基因的定位,从AT3位点的基因型构建了一个二元性状,并证明得分检验可检测到该“性状”与FY位点的连锁。在十个标记位点中的三个位点检测到了祖先特异性等位基因频率的错误设定——即给定等位基因的祖先情况下每个等位基因状态的概率。这里描述的方法在混合群体数据分析中有广泛应用,能够区分连锁和群体结构(个体间混合的变化)的影响。通过更多的标记和更复杂的统计模型,这种方法可以定位疾病风险中种族差异背后的基因。