Köppe Jeanette, Micheloud Charlotte, Erdmann Stella, Heyard Rachel, Held Leonhard
Institute of Biostatistics and Clinical Research, University of Muenster, Münster, Germany.
Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Zürich, Switzerland.
BMC Med Res Methodol. 2025 May 24;25(1):141. doi: 10.1186/s12874-025-02589-z.
The standard regulatory approach to assess replication success is the two-trials rule, requiring both the original and the replication study to be significant with effect estimates in the same direction. The sceptical p-value was recently presented as an alternative method for the statistical assessment of the replicability of study results.
We review the statistical properties of the sceptical p-value and compare those to the two-trials rule. We extend the methodology to non-inferiority trials and describe how to invert the sceptical p-value to obtain confidence intervals. We illustrate the performance of the different methods using real-world evidence emulations of randomized controlled trials (RCTs) conducted within the RCT DUPLICATE initiative.
The sceptical p-value depends not only on the two p-values, but also on sample size and effect size of the two studies. It can be calibrated to have the same Type-I error rate as the two-trials rule, but has larger power to detect an existing effect. In the application to the results from the RCT DUPLICATE initiative, the sceptical p-value leads to qualitatively similar results than the two-trials rule, but tends to show more evidence for treatment effects compared to the two-trials rule.
The sceptical p-value represents a valid statistical measure to assess the replicability of study results and is useful in the context of real-world evidence emulations.
评估复制成功性的标准监管方法是两次试验规则,要求原始研究和复制研究均具有显著性,且效应估计值方向相同。怀疑性p值最近被提出作为一种统计评估研究结果可重复性的替代方法。
我们回顾了怀疑性p值的统计特性,并将其与两次试验规则进行比较。我们将该方法扩展到非劣效性试验,并描述如何对怀疑性p值进行反向计算以获得置信区间。我们使用在RCT DUPLICATE计划中进行的随机对照试验(RCT)的真实证据模拟来说明不同方法的性能。
怀疑性p值不仅取决于两个p值,还取决于两项研究的样本量和效应大小。它可以校准为与两次试验规则具有相同的I型错误率,但检测现有效应的能力更强。在应用于RCT DUPLICATE计划的结果时,怀疑性p值与两次试验规则得出的定性结果相似,但与两次试验规则相比,往往显示出更多的治疗效果证据。
怀疑性p值是评估研究结果可重复性的有效统计量度,在真实证据模拟的背景下很有用。