Berger V W
Food and Drug Administration, Center for Biologics Evaluation and Research, 1401 Rockville Pike 200S, HFM-215, Rockville, MD 20852-1448, USA.
Stat Med. 2000 May 30;19(10):1319-28. doi: 10.1002/(sici)1097-0258(20000530)19:10<1319::aid-sim490>3.0.co;2-0.
Hypothesis testing, in which the null hypothesis specifies no difference between treatment groups, is an important tool in the assessment of new medical interventions. For randomized clinical trials, permutation tests that reflect the actual randomization are design-based analyses for such hypotheses. This means that only such design-based permutation tests can ensure internal validity, without which external validity is irrelevant. However, because of the conservatism of permutation tests, the virtues of permutation tests continue to be debated in the literature, and conclusions are generally of the type that permutation tests should always be used or permutation tests should never be used. A better conclusion might be that there are situations in which permutation tests should be used, and other situations in which permutation tests should not be used. This approach opens the door to broader agreement, but begs the obvious question of when to use permutation tests. We consider this issue from a variety of perspectives, and conclude that permutation tests are ideal to study efficacy in a randomized clinical trial which compares, in a heterogeneous patient population, two or more treatments, each of which may be most effective in some patients, when the primary analysis does not adjust for covariates. We propose the p-value interval as a novel measure of the conservatism of a permutation test that can be defined independently of the significance level. This p-value interval can be used to ensure that the permutation test have both good global power and an acceptable degree of conservatism.
假设检验(其中原假设规定治疗组之间无差异)是评估新医学干预措施的重要工具。对于随机临床试验,反映实际随机化的置换检验是针对此类假设的基于设计的分析。这意味着只有这种基于设计的置换检验才能确保内部有效性,没有内部有效性,外部有效性就无关紧要。然而,由于置换检验的保守性,置换检验的优点在文献中仍存在争议,结论通常是置换检验应始终使用或置换检验绝不应该使用。一个更好的结论可能是存在应使用置换检验的情况,以及存在不应使用置换检验的其他情况。这种方法为更广泛的共识打开了大门,但回避了何时使用置换检验这个明显的问题。我们从各种角度考虑这个问题,并得出结论,当在异质患者群体中比较两种或更多种治疗方法(每种方法在某些患者中可能最有效)且主要分析不调整协变量时,置换检验是研究随机临床试验疗效的理想方法。我们提出将p值区间作为置换检验保守性的一种新度量,它可以独立于显著性水平来定义。这个p值区间可用于确保置换检验既具有良好的全局检验效能又具有可接受的保守程度。