Department of Occupational and Environmental Medicine, Lund University Hospital, Lund, Sweden.
Int J Epidemiol. 2009 Oct;38(5):1364-73. doi: 10.1093/ije/dyp285. Epub 2009 Sep 4.
The P-value approach has been employed to prioritizing genome-wide association (GWA) scan signals, with a genome-wide significance defined by a prior P-value threshold, although this is not ideal. A rationale put forward is that the association signals rather should be expected to give less support for single nucleotide polymorphisms (SNPs) that are rare (with associated low-power tests) than for common SNPs with equivalent P-values, unless investigators believe, a priori, that rare causative variants contribute to the disease and have more pronounced effects.
Using data from a GWA scan for type 2 diabetes (1924 cases, 2938 controls, 393 453 SNPs), we compared P-values with four alternative signal measures: likelihood ratio (LR), Bayes factor (BF; with a specified prior distribution for true effects), 'frequentist factor' (FF; reflecting the ratio between estimated--post-data-- 'power' and P-value) and probability of pronounced effect size (PrPES).
The 19 common SNPs [minor allele frequency (MAF) among the controls >29%] yielding strong P-value signals (P < 5 x 10(-7)) were also top ranked by the other approaches. There was a strong similarity between the P-values, LR and BF signals, in terms of ranking SNPs. In contrast, FF and PrPES signals down-weighted rare SNPs (control MAF <10%) with low P-values.
For prioritization of signals that do not achieve compelling levels of evidence for association, the main driving force behind observed differences between the various association signals appears to be SNP MAF. The statistical power afforded by follow-up samples for establishing replication should be taken into account when tailoring the signal selection strategy.
已采用 P 值法对全基因组关联 (GWA) 扫描信号进行优先级排序,通过预设 P 值阈值来定义全基因组显著性,尽管这并非理想方法。提出的原理是,关联信号应较少支持罕见(相关的检验效能较低)单核苷酸多态性 (SNP),而较多支持具有等效 P 值的常见 SNP,除非研究人员预先认为罕见的致病变体对疾病有贡献且具有更显著的影响。
利用 2 型糖尿病 GWA 扫描数据(1924 例病例、2938 例对照、393453 个 SNP),我们比较了 P 值与四种替代信号测量方法:似然比 (LR)、贝叶斯因子 (BF;具有指定的真实效应先验分布)、“频率主义因子”(FF;反映估计后数据的“效能”与 P 值之间的比率)和显著效应大小的概率 (PrPES)。
在控制群体中罕见等位基因频率 (MAF) >29% 的 19 个常见 SNP [产生强烈 P 值信号 (P < 5 x 10(-7))] 也在其他方法中排名靠前。在 SNP 排名方面,P 值、LR 和 BF 信号之间具有很强的相似性。相比之下,FF 和 PrPES 信号对具有低 P 值的罕见 SNP(MAF <10%)进行了低估。
对于没有达到关联证据水平的信号进行优先级排序,观察到各种关联信号之间差异的主要驱动力似乎是 SNP MAF。在制定信号选择策略时,应考虑后续样本提供的统计效能,以确定是否需要复制。