Sherry Alexander D, Msaouel Pavlos, Miller Avital M, Lin Timothy A, Abi Jaoude Joseph, Kouzy Ramez, Passy Adina H, Meirson Tomer, Ignatiadis Nikolaos, McCaw Zachary R, van Zwet Erik, Ludmir Ethan B
Department of Radiation Oncology, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Radiation Oncology, Mayo Clinic, Rochester, MN, USA.
Department of Genitourinary Medical Oncology, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Eur J Cancer. 2025 Jul 4;226:115596. doi: 10.1016/j.ejca.2025.115596.
PURPOSE: The conventional assumption that P values ≤ 0.05 imply reproducible effects has come under recent criticism. This concern is particularly relevant in oncology, as phase III oncology trials, which directly inform practice, are usually not repeated. Using advanced modeling techniques, we investigated the relationship between P values and reproducibility in oncology. METHODS: We obtained the signal-to-noise ratio distribution in phase III oncology using outcomes from 632 two-arm superiority trials enrolling 496,219 patients. With this distribution, we estimated successful replication probability as the probability that a replicate trial, having the same design, effect size, and standard error, would have a two-sided P ≤ 0.05 and the same effect directionality as the original trial. We also estimated the following: the probability that the estimated effect had the same direction as the true effect (i.e., correct sign probability); the probability that the 95 % CI covered the true effect (i.e., coverage probability), and the ratio of the observed estimated effect to the true effect (i.e., exaggeration factor). RESULTS: The median exaggeration factor across all trials was 1.09 (IQR, 0.80-1.61). When P ≤ 0.05 in the original trial, mean correct sign probabilities were ≥ 97 % and mean coverage probabilities were between 93 % and 96 %. However, effects at P of 0.05, 0.01, and 0.001 had mean replication probabilities of 43 % (95 % CI: 35-45 %), 60 % (95 % CI: 53-61 %), and 77 % (95 % CI: 71-79 %), respectively. For trials with an overall survival primary endpoint that led directly to regulatory approval, the median replication probability was 66 %. A user-friendly web interface is provided to facilitate estimation of replication probabilities of individual trials. CONCLUSIONS: While the direction of observed effects is likely correct when P ≤ 0.05, treatment effects at P of 0.05 in phase III oncology trials are unlikely to be replicated successfully. By itself, statistical significance should not be equated with high replication probability.
目的:P值≤0.05意味着效应具有可重复性这一传统假设最近受到了批评。这一担忧在肿瘤学领域尤为相关,因为直接指导临床实践的III期肿瘤学试验通常不会重复进行。我们使用先进的建模技术,研究了肿瘤学中P值与可重复性之间的关系。 方法:我们利用632项双臂优效性试验(纳入496,219例患者)的结果,获得了III期肿瘤学试验中的信噪比分布。利用这一分布,我们将成功复制概率估计为具有相同设计、效应大小和标准误差的重复试验获得双侧P≤0.05且效应方向性与原试验相同的概率。我们还估计了以下内容:估计效应与真实效应方向相同的概率(即正确符号概率);95%置信区间涵盖真实效应的概率(即覆盖概率),以及观察到的估计效应与真实效应的比值(即夸大因子)。 结果:所有试验的中位数夸大因子为1.09(四分位间距,0.80 - 1.61)。当原试验中P≤0.05时,平均正确符号概率≥97%,平均覆盖概率在93%至96%之间。然而,P值为0.05、0.01和0.001时的效应,其平均复制概率分别为43%(95%置信区间:35 - 45%)、60%(95%置信区间:53 - 61%)和77%(95%置信区间:71 - 79%)。对于直接导致监管批准的以总生存期为主要终点的试验,中位数复制概率为66%。提供了一个用户友好的网络界面,以方便估计单个试验的复制概率。 结论:虽然当P≤0.05时观察到的效应方向可能是正确的,但III期肿瘤学试验中P值为0.05时的治疗效应不太可能成功复制。仅凭统计学显著性不应等同于高复制概率。
Psychopharmacol Bull. 2024-7-8
Clin Orthop Relat Res. 2024-9-1
Cochrane Database Syst Rev. 2017-12-22
Cochrane Database Syst Rev. 2021-4-19
Cochrane Database Syst Rev. 2020-1-9
Cochrane Database Syst Rev. 2020-10-19
Nat Ecol Evol. 2024-12
J Natl Cancer Inst. 2024-6-7