Crans Gerald G, Shuster Jonathan J
Department of Biostatistics, Amgen, Thousand Oaks, CA, USA.
Stat Med. 2008 Aug 15;27(18):3598-611. doi: 10.1002/sim.3221.
The debate as to which statistical methodology is most appropriate for the analysis of the two-sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79-85) proposed assessing the conservativeness of conditional methods using p-value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two-sided significance level, alpha = 0.05. Additionally, this numerical method is used to define new significance levels alpha() = alpha+epsilon, where epsilon is a small positive number, for each n, such that the size of the test is as close as possible to the pre-specified alpha (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using alpha() instead of alpha) in the two-sample comparative binomial trial.
关于哪种统计方法最适合用于两样本比较二项式试验的分析,这场争论已经持续了数十年。支持费舍尔条件方法(费舍尔精确检验,FET)的从业者声称,在进行分析时,只应考虑包含相同信息量的实验结果。因此,在实验的假设重复中,成功的总数应固定在其观察到的水平。在临床环境中使用条件方法可能会带来解释上的困难,因为结果是使用条件样本空间而不是所有可能结果的集合得出的。从临床试验设计的角度来看,也许更重要的是,这个检验可能过于保守,导致资源需求增加,更多的受试者接受实验性治疗。FET实际达到的显著性水平(检验的大小)在统计文献中尚未有报道。伯杰(《皇家统计学会杂志D辑(统计学家)》2001年;50:79 - 85)提议使用p值置信区间来评估条件方法的保守性。在本文中,我们开发了一种数值算法,用于计算每组样本量n最大为125时,在双侧显著性水平α = 0.05下FET的检验大小。此外,这种数值方法用于为每个n定义新的显著性水平α() = α + ε,其中ε是一个小的正数,使得检验大小尽可能接近预先指定的α(当前工作中为0.05)且不超过它。最后,给出了一个样本量和检验效能计算的例子,展示了在两样本比较二项式试验中对FET进行调整(使用α()而不是α)的统计优势。