Yu Chang, Zelterman Daniel
Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, U.S.A.
Department of Biostatistics, Yale University, New Haven, CT 06520, U.S.A.
Comput Stat Data Anal. 2017 Oct;114:105-118. doi: 10.1016/j.csda.2017.04.008. Epub 2017 Apr 29.
Microarray studies generate a large number of p-values from many gene expression comparisons. The estimate of the proportion of the p-values sampled from the null hypothesis draws broad interest. The two-component mixture model is often used to estimate this proportion. If the data are generated under the null hypothesis, the p-values follow the uniform distribution. What is the distribution of p-values when data are sampled from the alternative hypothesis? The distribution is derived for the chi-squared test. Then this distribution is used to estimate the proportion of p-values sampled from the null hypothesis in a parametric framework. Simulation studies are conducted to evaluate its performance in comparison with five recent methods. Even in scenarios with clusters of correlated p-values and a multicomponent mixture or a continuous mixture in the alternative, the new method performs robustly. The methods are demonstrated through an analysis of a real microarray dataset.
微阵列研究通过许多基因表达比较产生大量的p值。从原假设中抽样得到的p值比例估计引起了广泛关注。双组分混合模型常用于估计该比例。如果数据是在原假设下生成的,p值服从均匀分布。当从备择假设中抽样数据时,p值的分布是什么?推导了卡方检验的分布。然后在参数框架中使用该分布来估计从原假设中抽样得到的p值比例。进行了模拟研究以评估其与最近五种方法相比的性能。即使在具有相关p值聚类以及备择假设中的多组分混合或连续混合的情况下,新方法也表现稳健。通过对一个真实微阵列数据集的分析展示了这些方法。