Moher D, Dulberg C S, Wells G A
Clinical Epidemiology Unit, Loeb Medical Research Institute, Ottawa Civic Hospital, Ontario, Canada.
JAMA. 1994 Jul 13;272(2):122-4.
To describe the pattern over time in the level of statistical power and the reporting of sample size calculations in published randomized controlled trials (RCTs) with negative results.
Our study was a descriptive survey. Power to detect 25% and 50% relative differences was calculated for the subset of trials with negative results in which a simple two-group parallel design was used. Criteria were developed both to classify trial results as positive or negative and to identify the primary outcomes. Power calculations were based on results from the primary outcomes reported in the trials.
We reviewed all 383 RCTs published in JAMA, Lancet, and the New England Journal of Medicine in 1975, 1980, 1985, and 1990.
Twenty-seven percent of the 383 RCTs (n = 102) were classified as having negative results. The number of published RCTs more than doubled from 1975 to 1990, with the proportion of trials with negative results remaining fairly stable. Of the simple two-group parallel design trials having negative results with dichotomous or continuous primary outcomes (n = 70), only 16% and 36% had sufficient statistical power (80%) to detect a 25% or 50% relative difference, respectively. These percentages did not consistently increase over time. Overall, only 32% of the trials with negative results reported sample size calculations, but the percentage doing so has improved over time from 0% in 1975 to 43% in 1990. Only 20 of the 102 reports made any statement related to the clinical significance of the observed differences.
Most trials with negative results did not have large enough sample sizes to detect a 25% or a 50% relative difference. This result has not changed over time. Few trials discussed whether the observed differences were clinically important. There are important reasons to change this practice. The reporting of statistical power and sample size also needs to be improved.
描述已发表的阴性结果随机对照试验(RCT)中统计效能水平及样本量计算报告随时间的变化模式。
我们的研究为描述性调查。针对采用简单两组平行设计且结果为阴性的试验子集,计算检测25%和50%相对差异的效能。制定了将试验结果分类为阳性或阴性以及确定主要结局的标准。效能计算基于试验中报告的主要结局结果。
我们检索了1975年、1980年、1985年和1990年发表在《美国医学会杂志》《柳叶刀》和《新英格兰医学杂志》上的所有383项RCT。
383项RCT中有27%(n = 102)被分类为结果阴性。从1975年到1990年,发表的RCT数量增加了一倍多,结果为阴性的试验比例保持相当稳定。在主要结局为二分法或连续性变量且结果为阴性的简单两组平行设计试验(n = 70)中,分别只有16%和36%具有足够的统计效能(80%)来检测25%或50%的相对差异。这些百分比并未随时间持续增加。总体而言,结果为阴性的试验中只有32%报告了样本量计算,但这样做的百分比已随时间从1975年的0%提高到1990年的43%。102份报告中只有20份对观察到的差异的临床意义做出了任何说明。
大多数结果为阴性的试验样本量不足以检测25%或50%的相对差异。这一结果并未随时间改变。很少有试验讨论观察到的差异是否具有临床重要性。改变这种做法有重要原因。统计效能和样本量的报告也需要改进。