Clark V J, Metheny N, Dean M, Peterson R J
Laboratory of Genomic Diversity, NCI at Frederick, MD 21702, USA.
Hum Genet. 2001 Jun;108(6):484-93. doi: 10.1007/s004390100512.
As more SNP marker data becomes available, researchers have used haplotypes of markers, rather than individual polymorphisms, for association analysis of candidate genes. In order to perform haplotype analysis in a population-based case-control study, haplotypes must be determined by estimation in the absence of family information or laboratory methods for establishing phase. Here, we test the accuracy of the Expectation-Maximization (EM) algorithm for estimating haplotype state and frequency in the CCR2-CCR5 gene region by comparison with haplotype state and frequency determined by pedigree analysis. To do this, we have characterized haplotypes comprising alleles at seven biallelic loci in the CCR2-CCR5 chemokine receptor gene region, a span of 20 kb on chromosome 3p21. Three-generation CEPH families (n=40), totaling 489 individuals, were genotyped by the 5'nuclease assay (TaqMan). Haplotype states and frequencies were compared in 103 grandparents who were assumed to have mated at random. Both pedigree analysis and the EM algorithm yielded the same small number of haplotypes for which linkage disequilibrium was nearly maximal. The haplotype frequencies generated by the two methods were nearly identical. These results suggest that the EM algorithm estimation of haplotype states, frequency, and linkage disequilibrium analysis will be an effective strategy in the CCR2-CCR5 gene region. For genetic epidemiology studies, CCR2-CCR5 allele and haplotype frequencies were determined in African-American (n=30), Hispanic (n=24) and European-American (n=34) populations.
随着越来越多的单核苷酸多态性(SNP)标记数据可用,研究人员已使用标记的单倍型而非单个多态性来进行候选基因的关联分析。为了在基于人群的病例对照研究中进行单倍型分析,必须在没有家族信息或用于确定相位的实验室方法的情况下通过估计来确定单倍型。在这里,我们通过与系谱分析确定的单倍型状态和频率进行比较,测试期望最大化(EM)算法在CCR2 - CCR5基因区域估计单倍型状态和频率的准确性。为此,我们对CCR2 - CCR5趋化因子受体基因区域(位于3号染色体p21上跨度为20 kb)中七个双等位基因位点的等位基因组成的单倍型进行了特征分析。通过5'核酸酶测定法(TaqMan)对40个三代CEPH家族(共489人)进行基因分型。在假定随机交配的103名祖父母中比较了单倍型状态和频率。系谱分析和EM算法都产生了相同数量较少的单倍型,其连锁不平衡几乎达到最大。两种方法产生的单倍型频率几乎相同。这些结果表明,EM算法估计单倍型状态、频率和连锁不平衡分析将是CCR2 - CCR5基因区域的一种有效策略。对于遗传流行病学研究,在非裔美国人(n = 30)、西班牙裔(n = 24)和欧裔美国人(n = 34)人群中确定了CCR2 - CCR5等位基因和单倍型频率。