Froud Robert, Rajendran Dévan, Patel Shilpa, Bright Philip, Bjørkli Tom, Eldridge Sandra, Buchbinder Rachelle, Underwood Martin
Department of Health Sciences, Kristiania University College, Oslo, Norway.
Warwick Medical School, University of Warwick, Coventry, UK.
Spine (Phila Pa 1976). 2017 Jun 1;42(11):E680-E686. doi: 10.1097/BRS.0000000000001953.
A systematic review of nonspecific low back pain trials published between 1980 and 2012.
To explore what proportion of trials have been powered to detect different bands of effect size; whether there is evidence that sample size in low back pain trials has been increasing; what proportion of trial reports include a sample size calculation; and whether likelihood of reporting sample size calculations has increased.
Clinical trials should have a sample size sufficient to detect a minimally important difference for a given power and type I error rate. An underpowered trial is one within which probability of type II error is too high. Meta-analyses do not mitigate underpowered trials.
Reviewers independently abstracted data on sample size at point of analysis, whether a sample size calculation was reported, and year of publication. Descriptive analyses were used to explore ability to detect effect sizes, and regression analyses to explore the relationship between sample size, or reporting sample size calculations, and time.
We included 383 trials. One-third were powered to detect a standardized mean difference of less than 0.5, and 5% were powered to detect less than 0.3. The average sample size was 153 people, which increased only slightly (∼4 people/yr) from 1980 to 2000, and declined slightly (∼4.5 people/yr) from 2005 to 2011 (P < 0.00005). Sample size calculations were reported in 41% of trials. The odds of reporting a sample size calculation (compared to not reporting one) increased until 2005 and then declined (Equation is included in full-text article.).
Sample sizes in back pain trials and the reporting of sample size calculations may need to be increased. It may be justifiable to power a trial to detect only large effects in the case of novel interventions.
对1980年至2012年间发表的非特异性腰痛试验进行系统评价。
探讨有多大比例的试验有足够的效能来检测不同效应量区间;是否有证据表明腰痛试验的样本量一直在增加;试验报告中包含样本量计算的比例是多少;以及报告样本量计算的可能性是否增加。
临床试验应具有足够的样本量,以在给定的检验效能和I型错误率下检测出最小重要差异。效能不足的试验是指II型错误概率过高的试验。荟萃分析并不能弥补效能不足的试验。
reviewers独立提取分析时的样本量数据、是否报告了样本量计算以及发表年份。采用描述性分析来探讨检测效应量的能力,并采用回归分析来探讨样本量或报告样本量计算与时间之间的关系。
我们纳入了383项试验。三分之一的试验有足够效能检测标准化均数差小于0.5,5%的试验有足够效能检测小于0.3。平均样本量为153人,从1980年到2000年仅略有增加(约每年4人),从2005年到2011年略有下降(约每年4.5人)(P < 0.00005)。41%的试验报告了样本量计算。报告样本量计算的几率(与未报告相比)在2005年前增加,之后下降(全文包含公式)。
腰痛试验的样本量以及样本量计算的报告可能需要增加。对于新的干预措施,使试验仅检测大效应可能是合理的。
3级