单臂与随机化 II 期癌症临床试验中错误率的比较。
Comparison of error rates in single-arm versus randomized phase II cancer clinical trials.
机构信息
Mayo Clinic, 200 1st St SW, Rochester, MN 55905, USA.
出版信息
J Clin Oncol. 2010 Apr 10;28(11):1936-41. doi: 10.1200/JCO.2009.25.5489. Epub 2010 Mar 8.
PURPOSE To improve the understanding of the appropriate design of phase II oncology clinical trials, we compared error rates in single-arm, historically controlled and randomized, concurrently controlled designs. PATIENTS AND METHODS We simulated error rates of both designs separately from individual patient data from a large colorectal cancer phase III trials and statistical models, which take into account random and systematic variation in historical control data. RESULTS In single-arm trials, false-positive error rates (type I error) were 2 to 4 times those projected when modest drift or patient selection effects (eg, 5% absolute shift in control response rate) were included in statistical models. The power of single-arm designs simulated using actual data was highly sensitive to the fraction of patients from treatment centers with high versus low patient volumes, the presence of patient selection effects or temporal drift in response rates, and random small-sample variation in historical controls. Increasing sample size did not correct the over optimism of single-arm studies. Randomized two-arm design conformed to planned error rates. CONCLUSION Variability in historical control success rates, outcome drifts in patient populations over time, and/or patient selection effects can result in inaccurate false-positive and false-negative error rates in single-arm designs, but leave performance of the randomized two-arm design largely unaffected at the cost of 2 to 4 times the sample size compared with single-arm designs. Given a large enough patient pool, the randomized phase II designs provide a more accurate decision for screening agents before phase III testing.
目的
为了提高对肿瘤学 II 期临床试验合理设计的认识,我们比较了单臂、历史对照和随机对照设计的误差率。
方法
我们分别从大型结直肠癌 III 期临床试验的个体患者数据和统计模型中模拟了这两种设计的误差率,这些模型考虑了历史对照数据中的随机和系统变化。
结果
在单臂试验中,当统计模型中包含适度的漂移或患者选择效应(例如,对照反应率有 5%的绝对变化)时,假阳性错误率(I 型错误)是预计值的 2 到 4 倍。使用实际数据模拟的单臂设计的效能对高、低患者量治疗中心的患者比例、患者选择效应或反应率的时间漂移以及历史对照中随机小样本变异高度敏感。增加样本量并不能纠正单臂研究的过度乐观。随机两臂设计符合计划的误差率。
结论
历史对照成功率的变异性、患者群体随时间的结果漂移和/或患者选择效应可能导致单臂设计中不准确的假阳性和假阴性错误率,但以与单臂设计相比增加 2 到 4 倍的样本量为代价,使随机两臂设计的性能基本不受影响。在有足够大的患者群体的情况下,随机 II 期设计为在 III 期试验之前筛选药物提供了更准确的决策。