Wollstein Andreas, Herrmann Alexander, Wittig Michael, Nothnagel Michael, Franke Andre, Nürnberg Peter, Schreiber Stefan, Krawczak Michael, Hampe Jochen
Cologne Center for Genomics, Cologne, Germany.
Nucleic Acids Res. 2007;35(17):e113. doi: 10.1093/nar/gkm621. Epub 2007 Aug 28.
The power of a genome-wide disease association study depends critically upon the properties of the marker set used, particularly the number and physical spacing of markers, and the level of inter-marker association due to linkage disequilibrium. Extending our previously devised theoretical framework for the entropy-based selection of genetic markers, we have developed a local measure of the efficacy of a marker set, relative to including a maximally polymorphic single nucleotide polymorphism (SNP) at the map position of interest. Using this quantitative criterion, we evaluated five currently available SNP sets, namely Affymetrix 100K and 500K, and Illumina 100K, 300K and 550K in the CEU, YRI and JPT + CHB HapMap populations. At 50% relative efficacy, the commercial marker sets cover between 19 and 68% of the human genome, depending upon the population under study. An optimal technology-independent 500K marker set constructed from HapMap for Caucasians, in contrast, would achieve 73% coverage at the same relative efficacy.
全基因组疾病关联研究的效能关键取决于所用标记集的特性,特别是标记的数量和物理间距,以及由于连锁不平衡导致的标记间关联程度。在我们之前设计的基于熵的遗传标记选择理论框架基础上,我们开发了一种标记集效能的局部度量方法,该方法相对于在感兴趣的图谱位置包含一个最大程度多态的单核苷酸多态性(SNP)而言。使用这个定量标准,我们在CEU、YRI和JPT + CHB HapMap人群中评估了五个目前可用的SNP集,即Affymetrix 100K和500K,以及Illumina 100K、300K和550K。在相对效能为50%时,商业标记集覆盖人类基因组的19%至68%,这取决于所研究的人群。相比之下,由HapMap构建的针对高加索人的最佳与技术无关的500K标记集在相同相对效能下将实现73%的覆盖率。