Institute for Statistics, University of Bremen, Bremen, Germany.
Biom J. 2024 Jan;66(1):e2300077. doi: 10.1002/bimj.202300077. Epub 2023 Oct 19.
P-values that are derived from continuously distributed test statistics are typically uniformly distributed on (0,1) under least favorable parameter configurations (LFCs) in the null hypothesis. Conservativeness of a p-value P (meaning that P is under the null hypothesis stochastically larger than uniform on (0,1)) can occur if the test statistic from which P is derived is discrete, or if the true parameter value under the null is not an LFC. To deal with both of these sources of conservativeness, we present two approaches utilizing randomized p-values. We illustrate their effectiveness for testing a composite null hypothesis under a binomial model. We also give an example of how the proposed p-values can be used to test a composite null in group testing designs. We find that the proposed randomized p-values are less conservative compared to nonrandomized p-values under the null hypothesis, but that they are stochastically not smaller under the alternative. The problem of establishing the validity of randomized p-values has received attention in previous literature. We show that our proposed randomized p-values are valid under various discrete statistical models, which are such that the distribution of the corresponding test statistic belongs to an exponential family. The behavior of the power function for the tests based on the proposed randomized p-values as a function of the sample size is also investigated. Simulations and a real data example are used to compare the different considered p-values.
在零假设下,最不利参数配置(LFC)中,源于连续分布的检验统计量的 P 值通常在(0,1)上均匀分布。如果从中得出 P 值的检验统计量是离散的,或者零假设下的真实参数值不是 LFC,则 P 值(即 P 在零假设下的随机性大于(0,1)上的均匀分布)可能具有保守性。为了解决这两种来源的保守性,我们提出了两种利用随机化 P 值的方法。我们举例说明了它们在二项式模型下检验复合零假设的有效性。我们还给出了如何在组测试设计中使用建议的 P 值来检验复合零假设的示例。我们发现,与零假设下的非随机化 P 值相比,所提出的随机化 P 值具有较低的保守性,但在替代假设下,它们的随机性并不小。在先前的文献中,已经关注了随机化 P 值的有效性问题。我们表明,我们提出的随机化 P 值在各种离散统计模型下都是有效的,这些模型使得相应的检验统计量的分布属于指数族。还研究了基于所提出的随机化 P 值的检验的功效函数作为样本大小的函数的行为。模拟和实际数据示例用于比较不同考虑的 P 值。