Faculty of Medicine & Dentistry, University of Toronto, Toronto, ON, M5S 1A1, Canada.
Pragmatic Trials Collaborative, Faculty of Medicine & Dentistry, University of Alberta, 6-60 University Terrace, Edmonton, AB, T6G 2T4, Canada.
Trials. 2022 Sep 5;23(1):747. doi: 10.1186/s13063-022-06689-9.
Stopping trials early because of a favourable interim analysis can exaggerate benefit. This study simulated trials typical of those stopping early for benefit in the real world and estimated the degree to which early stopping likely overestimates benefit.
From 1 million simulated trials, we selected those trials that exceeded interim stopping criteria, and compared apparent benefit when stopped with the true benefit used to generate the data. Each simulation randomly assigned period of observation, number of subjects, and control event rate using normal distributions centred on the same parameters in a template trial typical of real-world "truncated" (i.e. stopped for benefit) trials. The intervention's true relative risk reduction (RRR) was also randomized, and assumed 1% of drugs have a warfarin-like effect (60% RRR), 5% a statin-like effect (35% RRR), 39% an ASA-like effect (15% RRR), 50% no effect (0% RRR), and that 5% would cause harm (modelled as a 20% relative risk increase). Trials had a single interim analysis and a z-value for stopping of 2.782 (O'Brien-Fleming threshold). We also modelled (1) a large truncated trial based on the SPRINT blood pressure trial (using SPRINT's parameters and stopping criteria) and (2) the same typical truncated trials if they instead went to completion as planned with no interim analysis.
For typical truncated trials, the true RRR was roughly 2/3 the observed RRR at the time of stopping. RRR was overestimated by an absolute 14.9% (median, IQR 6.4-24.6) in typical truncated trials, by 5.3% (IQR -0.1 to 11.4) in the same trials if instead carried to completion, and by 2.3% (IQR 0.98-1.09) in large SPRINT-like trials. For all models, to keep the absolute RRR overestimate below 5%, 250 events were required.
Simulated trials typical of those stopping early for benefit overestimate the true relative risk reduction by roughly 50% (i.e. the true RRR was 2/3 of the observed value). Overestimation was much smaller, and likely unimportant, when simulating large SPRINT-like trials stopping early. Whether trials were large or small, stopped early or not, a minimum 250 events were needed to avoid overestimating relative risk reduction by an absolute 5% or more.
由于中期分析结果有利而提前终止试验可能会夸大疗效。本研究模拟了在真实世界中因疗效而提前终止的典型试验,并估计了提前终止可能导致的疗效高估程度。
从 100 万次模拟试验中,我们选择了那些超过中期终止标准的试验,并比较了终止时的明显疗效与用于生成数据的真实疗效。每个模拟试验都使用正态分布随机分配观察期、受试者数量和对照事件率,这些参数以模板试验(典型的“截断”试验,即因疗效而提前终止)中的相同参数为中心。干预措施的真实相对风险降低(RRR)也随机分配,并假设 1%的药物具有华法林样作用(RRR 为 60%),5%的药物具有他汀类药物样作用(RRR 为 35%),39%的药物具有阿司匹林样作用(RRR 为 15%),50%的药物没有作用(RRR 为 0%),5%的药物会造成伤害(模拟为 20%的相对风险增加)。试验有单次中期分析和 2.782 的 z 值(O'Brien-Fleming 阈值)用于停止。我们还对以下情况进行了建模:(1)基于 SPRINT 血压试验的大型截断试验(使用 SPRINT 的参数和停止标准);(2)如果它们按照计划完成而没有中期分析,则是相同的典型截断试验。
对于典型的截断试验,在停止时,真实的 RRR 大约是观察到的 RRR 的 2/3。在典型的截断试验中,RRR 的绝对高估了 14.9%(中位数,IQR 6.4-24.6),如果改为按计划完成,则高估了 5.3%(IQR -0.1 至 11.4),在大型 SPRINT 样试验中高估了 2.3%(IQR 0.98-1.09)。对于所有模型,要将绝对 RRR 高估保持在 5%以下,则需要 250 个事件。
因疗效而提前终止的典型模拟试验高估了真实的相对风险降低约 50%(即真实的 RRR 是观察值的 2/3)。当模拟大型 SPRINT 样试验提前终止时,高估程度较小,且可能不重要。无论试验规模大小、提前终止与否,都需要至少 250 个事件才能避免相对风险降低高估绝对值超过 5%。