Bonnen Penelope E, Wang Peggy J, Kimmel Marek, Chakraborty Ranajit, Nelson David L
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
Genome Res. 2002 Dec;12(12):1846-53. doi: 10.1101/gr.483802.
To facilitate association-based linkage studies we have studied the linkage disequilibrium (LD) and haplotype architecture around five genes of interest for cancer risk: ATM, BRCA1, BRCA2, RAD51, and TP53. Single nucleotide polymorphisms (SNPs) were identified and used to construct haplotypes that span 93-200 kb per locus with an average SNP density of 12 kb. These markers were genotyped in four ethnically defined populations that contained 48 each of African Americans, Asian Americans, Hispanic Americans, and European Americans. Haplotypes were inferred using an expectation maximization (EM) algorithm, and the data were analyzed using D', R(2), Fisher's exact P-values, and the four-gamete test for recombination. LD levels varied widely between loci from continuously high LD across 200 kb to a virtual absence of LD across a similar length of genome. LD structure also varied at each gene and between populations studied. This variation indicates that the success of linkage-based studies will require a precise description of LD at each locus and in each population to be studied. One striking consistency between genes was that at each locus a modest number of haplotypes present in each population accounted for a high fraction of the total number of chromosomes. We conclude that each locus has its own genomic profile with regard to LD, and despite this there is the widespread trend of relatively low haplotype diversity. As a result, a low marker density should be adequate to identify haplotypes that represent the common variation at a locus, thereby decreasing costs and increasing efficacy of association studies.
为促进基于关联的连锁研究,我们研究了与癌症风险相关的五个感兴趣基因(ATM、BRCA1、BRCA2、RAD51和TP53)周围的连锁不平衡(LD)和单倍型结构。鉴定了单核苷酸多态性(SNP),并用于构建每个基因座跨度为93 - 200 kb、平均SNP密度为12 kb的单倍型。这些标记在四个按种族定义的人群中进行基因分型,每个群体包含48名非裔美国人、亚裔美国人、西班牙裔美国人和欧裔美国人。使用期望最大化(EM)算法推断单倍型,并使用D'、R²、Fisher精确P值和重组的四配子检验对数据进行分析。不同基因座之间的LD水平差异很大,从跨越200 kb的持续高LD到在相似长度基因组上几乎不存在LD。LD结构在每个基因以及所研究的群体之间也有所不同。这种变异表明,基于连锁的研究要取得成功,需要精确描述每个要研究的基因座和群体中的LD情况。各基因之间一个显著的一致性是,在每个基因座上,每个群体中存在的少量单倍型占染色体总数的很大比例。我们得出结论,每个基因座在LD方面都有其自身的基因组特征,尽管如此,单倍型多样性相对较低的普遍趋势仍然存在。因此,低标记密度应该足以识别代表基因座常见变异的单倍型,从而降低成本并提高关联研究的效率。