Nicolae Dan L, Wen Xiaoquan, Voight Benjamin F, Cox Nancy J
Department of Statistics, The University of Chicago, Chicago, Illinois, USA.
PLoS Genet. 2006 May;2(5):e67. doi: 10.1371/journal.pgen.0020067. Epub 2006 May 5.
Improvements in technology have made it possible to conduct genome-wide association mapping at costs within reach of academic investigators, and experiments are currently being conducted with a variety of high-throughput platforms. To provide an appropriate context for interpreting results of such studies, we summarize here results of an investigation of one of the first of these technologies to be publicly available, the Affymetrix GeneChip Human Mapping 100K set of single nucleotide polymorphisms (SNPs). In a systematic analysis of the pattern and distribution of SNPs in the Mapping 100K set, we find that SNPs in this set are undersampled from coding regions (both nonsynonymous and synonymous) and oversampled from regions outside genes, relative to SNPs in the overall HapMap database. In addition, we utilize a novel multilocus linkage disequilibrium (LD) coefficient based on information content (analogous to the information content scores commonly used for linkage mapping) that is equivalent to the familiar measure r2 in the special case of two loci. Using this approach, we are able to summarize for any subset of markers, such as the Affymetrix Mapping 100K set, the information available for association mapping in that subset, relative to the information available in the full set of markers included in the HapMap, and highlight circumstances in which this multilocus measure of LD provides substantial additional insight about the haplotype structure in a region over pairwise measures of LD.
技术的进步使得学术研究人员能够在可承受的成本范围内进行全基因组关联图谱分析,目前正在使用各种高通量平台开展实验。为了给解释此类研究结果提供一个合适的背景,我们在此总结了对最早公开可用的此类技术之一——Affymetrix基因芯片人类图谱100K单核苷酸多态性(SNP)集——的一项调查结果。在对100K图谱集中SNP的模式和分布进行系统分析时,我们发现,相对于整个HapMap数据库中的SNP而言,该图谱集中的SNP在编码区(非同义及同义)的采样不足,而在基因外区域的采样过度。此外,我们基于信息含量利用了一种新的多位点连锁不平衡(LD)系数(类似于常用于连锁图谱分析的信息含量得分),在两个位点的特殊情况下,该系数等同于常用的r2测量值。使用这种方法,我们能够针对任何标记子集(如Affymetrix 图谱100K集)总结相对于HapMap中包含的全部标记子集中可用于关联图谱分析的信息,并突出显示这种多位点LD测量相对于成对LD测量能为一个区域的单倍型结构提供更多深入见解的情况。