关于两个独立二项比例差异的精确无条件推断的警示说明。

A cautionary note on exact unconditional inference for a difference between two independent binomial proportions.

作者信息

Mehrotra Devan V, Chan Ivan S F, Berger Roger L

机构信息

Merck Research Laboratories, UN-A102, 785 Jolly Rd., Bldg. C, Blue Bell, Pennsylvania 19422, USA.

出版信息

Biometrics. 2003 Jun;59(2):441-50. doi: 10.1111/1541-0420.00051.

DOI:10.1111/1541-0420.00051

PMID:12926729

Abstract

Fisher's exact test for comparing response proportions in a randomized experiment can be overly conservative when the group sizes are small or when the response proportions are close to zero or one. This is primarily because the null distribution of the test statistic becomes too discrete, a partial consequence of the inference being conditional on the total number of responders. Accordingly, exact unconditional procedures have gained in popularity, on the premise that power will increase because the null distribution of the test statistic will presumably be less discrete. However, we caution researchers that a poor choice of test statistic for exact unconditional inference can actually result in a substantially less powerful analysis than Fisher's conditional test. To illustrate, we study a real example and provide exact test size and power results for several competing tests, for both balanced and unbalanced designs. Our results reveal that Fisher's test generally outperforms exact unconditional tests based on using as the test statistic either the observed difference in proportions, or the observed difference divided by its estimated standard error under the alternative hypothesis, the latter for unbalanced designs only. On the other hand, the exact unconditional test based on the observed difference divided by its estimated standard error under the null hypothesis (score statistic) outperforms Fisher's test, and is recommended. Boschloo's test, in which the p-value from Fisher's test is used as the test statistic in an exact unconditional test, is uniformly more powerful than Fisher's test, and is also recommended.

摘要

在随机实验中，当组规模较小或者反应比例接近零或一时，用于比较反应比例的费舍尔精确检验可能会过于保守。这主要是因为检验统计量的零分布变得过于离散，这是由于推断以反应者总数为条件而产生的部分结果。因此，精确无条件程序越来越受欢迎，前提是检验统计量的零分布可能会不那么离散，从而功效会提高。然而，我们提醒研究人员，对于精确无条件推断而言，检验统计量选择不当实际上可能导致分析功效比费舍尔条件检验低得多。为了说明这一点，我们研究了一个实际例子，并给出了几种竞争检验在平衡和不平衡设计下的精确检验规模和功效结果。我们的结果表明，费舍尔检验通常优于基于以下两种情况的精确无条件检验：一是将观察到的比例差异用作检验统计量，二是在备择假设下将观察到的差异除以其估计标准误差（仅适用于不平衡设计）。另一方面，基于在零假设下将观察到的差异除以其估计标准误差的精确无条件检验（得分统计量）优于费舍尔检验，因此推荐使用。在博施洛检验中，将费舍尔检验的p值用作精确无条件检验中的检验统计量，其功效始终高于费舍尔检验，也推荐使用。