The Faculty of Medicine and Health Technology, Tampere University, Arvo Ylpön katu 34, 33520, Tampere, Finland.
Department of Orthopaedics and Traumatology, Tampere University Hospital, Teiskontie 35, 33520, Tampere, Finland.
BMC Med Res Methodol. 2021 Mar 24;21(1):59. doi: 10.1186/s12874-021-01249-2.
Randomized controlled trials in orthopaedics are powered to mainly find large effect sizes. A possible discrepancy between the estimated and the real mean difference is a challenge for statistical inference based on p-values. We explored the justifications of the mean difference estimates used in power calculations. The assessment of distribution of observations in the primary outcome and the possibility of ceiling effects were also assessed.
Systematic review of the randomized controlled trials with power calculations in eight clinical orthopaedic journals published between 2016 and 2019. Trials with one continuous primary outcome and 1:1 allocation were eligible. Rationales and references for the mean difference estimate were recorded from the Methods sections. The possibility of ceiling effect was addressed by the assessment of the weighted mean and standard deviation of the primary outcome and its elaboration in the Discussion section of each RCT where available.
264 trials were included in this study. Of these, 108 (41 %) trials provided some rationale or reference for the mean difference estimate. The most common rationales or references for the estimate of mean difference were minimal clinical important difference (16 %), observational studies on the same subject (8 %) and the 'clinical relevance' of the authors (6 %). In a third of the trials, the weighted mean plus 1 standard deviation of the primary outcome reached over the best value in the patient-reported outcome measure scale, indicating the possibility of ceiling effect in the outcome.
The chosen mean difference estimates in power calculations are rarely properly justified in orthopaedic trials. In general, trials with a patient-reported outcome measure as the primary outcome do not assess or report the possibility of the ceiling effect in the primary outcome or elaborate further in the Discussion section.
骨科随机对照试验的设计主要是为了发现大的效应量。基于 p 值的统计推断存在估计均值差与实际均值差之间的差异,这是一个挑战。我们探讨了用于功效计算的均值差估计的合理性。还评估了主要结局的观测值分布和天花板效应的可能性。
对 2016 年至 2019 年期间在 8 种临床骨科期刊上发表的具有功效计算的随机对照试验进行系统回顾。符合条件的试验为具有单一连续主要结局和 1:1 分配的试验。从方法部分记录了均值差估计的理由和参考文献。通过评估主要结局的加权均数和标准差,并在每个 RCT 的讨论部分进行阐述(如果可用),来解决天花板效应的可能性。
本研究共纳入 264 项试验。其中,108 项(41%)试验为均值差估计提供了一些理由或参考文献。估计均值差最常见的理由或参考文献是最小临床重要差异(16%)、同一主题的观察性研究(8%)和作者的“临床相关性”(6%)。在三分之一的试验中,主要结局的加权均数加 1 个标准差超过了患者报告结局测量量表的最佳值,表明结局存在天花板效应的可能性。
在骨科试验中,功效计算中选择的均值差估计值很少得到适当的证明。一般来说,以患者报告结局测量为主要结局的试验不会评估或报告主要结局中天花板效应的可能性,也不会在讨论部分进一步阐述。