Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands.
Department of Statistics, Columbia University, New York.
NEJM Evid. 2024 Jan;3(1):EVIDoa2300003. doi: 10.1056/EVIDoa2300003. Epub 2023 Dec 22.
We have examined the primary efficacy results of 23,551 randomized clinical trials from the Cochrane Database of Systematic Reviews. METHODS: We estimate that the great majority of trials have much lower statistical power for actual effects than the 80 or 90% for the stated effect sizes. Consequently, “statistically significant” estimates tend to seriously overestimate actual treatment effects, “nonsignificant” results often correspond to important effects, and efforts to replicate often fail to achieve “significance” and may even appear to contradict initial results. To address these issues, we reinterpret the P value in terms of a reference population of studies that are, or could have been, in the Cochrane Database. RESULTS: This leads to an empirical guide for the interpretation of an observed P value from a “typical” clinical trial in terms of the degree of overestimation of the reported effect, the probability of the effect’s sign being wrong, and the predictive power of the trial. CONCLUSIONS: Such an interpretation provides additional insight about the effect under study and can guard medical researchers against naive interpretations of the P value and overoptimistic effect sizes. Because many research fields suffer from low power, our results are also relevant outside the medical domain. (Funded by the U.S. Office of Naval Research.)
我们检查了来自 Cochrane 系统评价数据库的 23551 项随机临床试验的主要疗效结果。
我们估计,绝大多数试验的实际效果的统计效力远低于规定效果大小的 80%或 90%。因此,“统计学上显著”的估计往往严重高估了实际治疗效果,“无统计学意义”的结果通常对应于重要的效果,而努力复制往往无法达到“显著”,甚至可能看起来与最初的结果相矛盾。为了解决这些问题,我们根据 Cochrane 数据库中已经存在或可能存在的研究参考人群重新解释 P 值。
这导致了一种经验性的指导,用于根据报告效果的高估程度、效果符号错误的概率以及试验的预测能力,从“典型”临床试验中观察到的 P 值来解释。
这种解释提供了关于研究中效果的额外见解,并可以防止医学研究人员对 P 值和过于乐观的效果大小进行盲目解释。由于许多研究领域的效力较低,我们的结果在医学领域之外也具有相关性。(由美国海军研究办公室资助)。