1 Pfizer Inc., New York, NY, USA.
2 Novartis Pharma AG, Basel, Switzerland.
Clin Trials. 2018 Oct;15(5):452-461. doi: 10.1177/1740774518770661.
Background Well-designed phase II trials must have acceptable error rates relative to a pre-specified success criterion, usually a statistically significant p-value. Such standard designs may not always suffice from a clinical perspective because clinical relevance may call for more. For example, proof-of-concept in phase II often requires not only statistical significance but also a sufficiently large effect estimate. Purpose We propose dual-criterion designs to complement statistical significance with clinical relevance, discuss their methodology, and illustrate their implementation in phase II. Methods Clinical relevance requires the effect estimate to pass a clinically motivated threshold (the decision value (DV)). In contrast to standard designs, the required effect estimate is an explicit design input, whereas study power is implicit. The sample size for a dual-criterion design needs careful considerations of the study's operating characteristics (type I error, power). Results Dual-criterion designs are discussed for a randomized controlled and a single-arm phase II trial, including decision criteria, sample size calculations, decisions under various data scenarios, and operating characteristics. The designs facilitate GO/NO-GO decisions due to their complementary statistical-clinical criterion. Limitations While conceptually simple, implementing a dual-criterion design needs care. The clinical DV must be elicited carefully in collaboration with clinicians, and understanding similarities and differences to a standard design is crucial. Conclusion To improve evidence-based decision-making, a formal yet transparent quantitative framework is important. Dual-criterion designs offer an appealing statistical-clinical compromise, which may be preferable to standard designs if evidence against the null hypothesis alone does not suffice for an efficacy claim.
背景 精心设计的 II 期临床试验必须相对于预设的成功标准(通常是具有统计学意义的 p 值)具有可接受的误差率。从临床角度来看,此类标准设计可能并不总是足够,因为临床相关性可能需要更多。例如,在 II 期临床试验中,概念验证不仅需要统计学意义,还需要足够大的效应估计。
目的 我们提出了双重标准设计,以补充统计学意义和临床相关性,讨论其方法学,并在 II 期临床试验中举例说明其实施。
方法 临床相关性要求效应估计值通过临床驱动的阈值(决策值 (DV))。与标准设计不同,所需的效应估计值是明确的设计输入,而研究效能是隐含的。双重标准设计的样本量需要仔细考虑研究的操作特性(I 型错误、效能)。
结果 讨论了随机对照和单臂 II 期临床试验的双重标准设计,包括决策标准、样本量计算、各种数据情况下的决策以及操作特性。这些设计由于其互补的统计临床标准,促进了 GO/NO-GO 决策。
局限性 虽然概念上简单,但实施双重标准设计需要谨慎。临床 DV 必须与临床医生合作仔细确定,并且理解与标准设计的相似性和差异至关重要。
结论 为了改善基于证据的决策,正式而透明的定量框架很重要。双重标准设计提供了一种吸引人的统计临床折衷方案,如果仅仅针对零假设的证据不足以支持疗效主张,那么与标准设计相比,它可能更可取。