Reito Aleksi, Raittio Lauri, Helminen Olli
Department of Orthopaedics, Tampere University and Tampere University Hospital, Tampere, Finland.
Department of Medicine and Health Technology, Tampere University, Tampere, Finland.
JBJS Rev. 2020 Feb;8(2):e0079. doi: 10.2106/JBJS.RVW.19.00079.
A study published in 2001 reported that sample sizes in the randomized controlled trials (RCTs) published in major orthopaedic journals in 1997 were too small, resulting in low power to detect reasonable effect sizes. Low power is the fundamental reason for the poor reproducibility of research findings and serves to erode a cornerstone of the scientific method. The aim of this study was to ascertain whether improvements have been made in orthopaedic research during the past 2 decades.
The electronic table of contents from the 2016 and 2017 volumes of 7 major orthopaedic journals were searched issue by issue in chronological order to identify possible RCTs. A posteriori (after-the-fact) power to detect small, medium, and large effect sizes, defined by the Cohen d value, were calculated from the sample sizes reported in the studies. The power to detect effect sizes associated with the most commonly used patient-reported outcome measures (PROMs) was also calculated. Finally, the use of a priori power analysis in the included studies was assessed.
In total, 233 studies were included in the final analyses. None of the negative studies had sufficient power (≥0.80) to detect a small effect size. Only between 15.0% and 32.1% of the negative studies had adequate power to detect a medium effect size. When categorized by anatomic region, 0% to 52.6% had adequate power to detect an effect size corresponding to the minimal clinically important difference (MCID). An a priori power analysis was employed in 196 (84%) of the 233 studies. However, the power analysis could not be replicated in 46% of the studies that used a mean comparison.
Although small improvements in orthopaedic RCTs have occurred during the past 2 decades, many RCTs are still underpowered: the sample sizes are still too small to have adequate power to detect what would be deemed clinically relevant.
2001年发表的一项研究报告称,1997年主要骨科期刊上发表的随机对照试验(RCT)样本量过小,导致检测合理效应量的效能较低。效能低下是研究结果可重复性差的根本原因,并且有损科学方法的基石。本研究的目的是确定在过去20年中骨科研究是否有所改进。
按时间顺序逐期检索7种主要骨科期刊2016年和2017年卷的电子目录,以识别可能的RCT。根据研究报告的样本量计算检测由科恩d值定义的小、中、大效应量的事后效能。还计算了检测与最常用的患者报告结局指标(PROM)相关的效应量的效能。最后,评估纳入研究中先验效能分析的使用情况。
最终分析共纳入233项研究。所有阴性研究均没有足够的效能(≥0.80)来检测小效应量。只有15.0%至32.1%的阴性研究有足够的效能来检测中等效应量。按解剖区域分类时,0%至52.6%的研究有足够的效能来检测对应于最小临床重要差异(MCID)的效应量。233项研究中有196项(84%)采用了先验效能分析。然而,在使用均值比较的研究中,46%的研究无法重复其效能分析。
尽管在过去20年中骨科RCT有小幅改进,但许多RCT的效能仍然不足:样本量仍然太小,没有足够的效能来检测临床上认为相关的效应量。