Lloyd Chris J
University of Melbourne, Carlton, Australia.
Biometrics. 2010 Sep;66(3):975-82. doi: 10.1111/j.1541-0420.2009.01354.x.
Clinical trials data often come in the form of low-dimensional tables of small counts. Standard approximate tests such as score and likelihood ratio tests are imperfect in several respects. First, they can give quite different answers from the same data. Second, the actual type-1 error can differ significantly from nominal, even for quite large sample sizes. Third, exact inferences based on these can be strongly nonmonotonic functions of the null parameter and lead to confidence sets that are discontiguous. There are two modern approaches to small sample inference. One is to use so-called higher order asymptotics (Reid, 2003, Annal of Statistics 31, 1695-1731) to provide an explicit adjustment to the likelihood ratio statistic. The theory for this is complex but the statistic is quick to compute. The second approach is to perform an exact calculation of significance assuming the nuisance parameters equal their null estimate (Lee and Young, 2005, Statistic and Probability Letters 71, 143-153), which is a kind of parametric bootstrap. The purpose of this article is to explain and evaluate these two methods, for testing whether a difference in probabilities p(2) - p(1) exceeds a prechosen noninferiority margin δ(0) . On the basis of an extensive numerical study, we recommend bootstrap P-values as superior to all other alternatives. First, they produce practically identical answers regardless of the basic test statistic chosen. Second, they have excellent size accuracy and higher power. Third, they vary much less erratically with the null parameter value δ(0) .
临床试验数据通常以小计数的低维表格形式呈现。标准的近似检验,如得分检验和似然比检验,在几个方面存在不足。首先,它们对相同的数据可能给出截然不同的答案。其次,即使对于相当大的样本量,实际的一类错误也可能与名义值有显著差异。第三,基于这些检验的精确推断可能是零假设参数的强非单调函数,并导致不连续的置信集。有两种现代的小样本推断方法。一种是使用所谓的高阶渐近性(Reid,2003年,《统计学年鉴》31卷,第1695 - 1731页)对似然比统计量进行显式调整。其理论很复杂,但统计量计算很快。第二种方法是在假设干扰参数等于其零假设估计值的情况下进行显著性的精确计算(Lee和Young,2005年,《统计与概率快报》71卷,第143 - 153页),这是一种参数自举法。本文的目的是解释和评估这两种方法,用于检验概率差异p(2) - p(1)是否超过预先选定的非劣效性界值δ(0)。基于广泛的数值研究,我们推荐自举P值优于所有其他方法。首先,无论选择何种基本检验统计量,它们给出的答案几乎相同。其次,它们具有出色的大小准确性和更高的功效。第三,它们随零假设参数值δ(0)的变化波动要小得多。