Röhmel Joachim, Gerlinger Christoph, Benda Norbert, Läuter Jürgen
Department of Biostatistics and Clinical Epidemiology, Charité University Medicine, Berlin, Germany.
Biom J. 2006 Dec;48(6):916-33. doi: 10.1002/bimj.200510289.
In a clinical trial with an active treatment and a placebo the situation may occur that two (or even more) primary endpoints may be necessary to describe the active treatment's benefit. The focus of our interest is a more specific situation with two primary endpoints in which superiority in one of them would suffice given that non-inferiority is observed in the other. Several proposals exist in the literature for dealing with this or similar problems, but prove insufficient or inadequate at a closer look (e.g. Bloch et al. (2001, 2006) or Tamhane and Logan (2002, 2004)). For example, we were unable to find a good reason why a bootstrap p-value for superiority should depend on the initially selected non-inferiority margins or on the initially selected type I error alpha. We propose a hierarchical three step procedure, where non-inferiority in both variables must be proven in the first step, superiority has to be shown by a bivariate test (e.g. Holm (1979), O'Brien (1984), Hochberg (1988), a bootstrap (Wang (1998)), or Läuter (1996)) in the second step, and then superiority in at least one variable has to be verified in the third step by a corresponding univariate test. All statistical tests are performed at the same one-sided significance level alpha. From the above mentioned bivariate superiority tests we preferred Läuter's SS test and the Holm procedure for the reason that these have been proven to control the type I error strictly, irrespective of the correlation structure among the primary variables and the sample size applied. A simulation study reveals that the performance regarding power of the bivariate test depends to a considerable degree on the correlation and on the magnitude of the expected effects of the two primary endpoints. Therefore, the recommendation of which test to choose depends on knowledge of the possible correlation between the two primary endpoints. In general, Läuter's SS procedure in step 2 shows the best overall properties, whereas Holm's procedure shows an advantage if both a positive correlation between the two variables and a considerable difference between their standardized effect sizes can be expected.
在一项涉及活性治疗和安慰剂的临床试验中,可能会出现需要两个(甚至更多)主要终点来描述活性治疗益处的情况。我们感兴趣的重点是一种更特殊的情况,即存在两个主要终点,其中只要在另一个终点观察到非劣效性,那么在其中一个终点上的优效性就足够了。文献中存在一些处理此类或类似问题的建议,但仔细研究后会发现这些建议并不充分或不合适(例如Bloch等人(2001年、2006年)或Tamhane和Logan(2002年、2004年))。例如,我们找不到充分的理由来解释为什么优效性的自抽样p值应该取决于最初选择的非劣效性界值或最初选择的I型错误α。我们提出了一种分层三步程序,第一步必须证明两个变量均具有非劣效性,第二步必须通过双变量检验(例如Holm(1979年)、O'Brien(1984年)、Hochberg(1988年)、自抽样法(Wang(1998年))或Läuter(1996年))显示优效性,然后在第三步必须通过相应的单变量检验验证至少一个变量具有优效性。所有统计检验均在相同的单侧显著性水平α下进行。基于Läuter的SS检验和Holm程序已被证明能严格控制I型错误,而与主要变量之间的相关结构和所应用的样本量无关,我们在上述双变量优效性检验中更倾向于它们。一项模拟研究表明,双变量检验在效能方面的表现很大程度上取决于两个主要终点的相关性和预期效应的大小。因此,关于选择哪种检验的建议取决于对两个主要终点之间可能相关性的了解。一般来说,第二步中的Läuter的SS程序总体性能最佳,而如果预计两个变量之间存在正相关且其标准化效应大小之间存在相当大的差异,那么Holm程序具有优势。