Graffelman Jan, Moreno Victor
Department of Statistics and Operations Research, Universitat Politècnica de Catalunya, Avinguda Diagonal 647, Barcelona, Spain
Stat Appl Genet Mol Biol. 2013 Aug;12(4):433-48. doi: 10.1515/sagmb-2012-0039.
Exact tests for Hardy-Weinberg equilibrium are widely used in genetic association studies. We evaluate the mid p-value, unknown in the genetics literature, as an alternative for the standard p-value in the exact test.
The type 1 error rate and the power of the exact test are calculated for different sample sizes, significance levels, minor allele counts and degrees of deviation from equilibrium. Three different p-value are considered: the standard two-sided p-value, the doubled one-sided p-value and the mid p-value. Practical implications of using the mid p-value are discussed with HapMap datasets and a data set on colon cancer.
The mid p-value is shown to have a type 1 error rate that is always closer to the nominal level, and to have better power. Differences between the standard p-value and the mid p-value can be large for insignificant results, and are smaller for significant results. The analysis of empirical databases shows that the mid p-value uncovers more significant markers, and that the equilibrium null distribution is not tenable for both databases.
The standard exact p-value is overly conservative, in particular for small minor allele frequencies. The mid p-value ameliorates this problem by bringing the rejection rate closer to the nominal level, at the price of occasionally exceeding the nominal level.
哈迪-温伯格平衡的确切检验在基因关联研究中被广泛应用。我们评估遗传学文献中未知的中间p值,作为确切检验中标准p值的替代。
针对不同样本量、显著性水平、次要等位基因计数和偏离平衡的程度,计算确切检验的I型错误率和检验效能。考虑三种不同的p值:标准双侧p值、双侧单尾p值的两倍以及中间p值。利用国际人类基因组单体型图计划(HapMap)数据集和一个结肠癌数据集讨论使用中间p值的实际意义。
结果显示中间p值的I型错误率始终更接近名义水平,且检验效能更高。对于不显著的结果,标准p值与中间p值之间的差异可能很大,而对于显著结果,差异较小。对经验数据库的分析表明,中间p值能发现更多显著的标记,且两个数据库的平衡零分布均不成立。
标准的确切p值过于保守,尤其是对于小的次要等位基因频率。中间p值通过使拒绝率更接近名义水平来改善这一问题,但代价是偶尔会超过名义水平。