Held Leonhard, Pawel Samuel, Micheloud Charlotte
Epidemiology Biostatistics and Prevention Institute (EBPI) and Center for Reproducible Science (CRS), University of Zurich, Hirschengraben 84, Zurich 8001, Switzerland.
R Soc Open Sci. 2024 Aug 28;11(8):240149. doi: 10.1098/rsos.240149. eCollection 2024 Aug.
Statistical significance of both the original and the replication study is a commonly used criterion to assess replication attempts, also known as the two-trials rule in drug development. However, replication studies are sometimes conducted although the original study is non-significant, in which case Type-I error rate control across both studies is no longer guaranteed. We propose an alternative method to assess replicability using the sum of -values from the two studies. The approach provides a combined -value and can be calibrated to control the overall Type-I error rate at the same level as the two-trials rule but allows for replication success even if the original study is non-significant. The unweighted version requires a less restrictive level of significance at replication if the original study is already convincing which facilitates sample size reductions of up to 10%. Downweighting the original study accounts for possible bias and requires a more stringent significance level and larger sample sizes at replication. Data from four large-scale replication projects are used to illustrate and compare the proposed method with the two-trials rule, meta-analysis and Fisher's combination method.
原始研究和重复研究的统计学显著性是评估重复尝试的常用标准,在药物开发中也称为两次试验规则。然而,有时即使原始研究不显著也会进行重复研究,在这种情况下,无法再保证两项研究的I型错误率得到控制。我们提出了一种使用两项研究的p值之和来评估可重复性的替代方法。该方法提供了一个综合p值,并且可以进行校准,以将总体I型错误率控制在与两次试验规则相同的水平,但即使原始研究不显著也允许重复成功。如果原始研究已经很有说服力,未加权版本在重复时对显著性水平的要求较低,这有助于将样本量减少多达10%。对原始研究进行加权会考虑到可能的偏差,并且在重复时需要更严格的显著性水平和更大的样本量。来自四个大规模重复项目的数据用于说明并将所提出的方法与两次试验规则、荟萃分析和费舍尔组合方法进行比较。