Center for Evidence-based Healthcare, University Hospital and Faculty of Medicine Carl Gustav Carus, TU Dresden, Germany.
University Center of Orthopaedics and Traumatology, University Medicine Carl Gustav Carus Dresden, TU Dresden, Germany.
BMC Med Res Methodol. 2020 Feb 10;20(1):28. doi: 10.1186/s12874-020-0912-8.
BACKGROUND: Consensus-orientated Delphi studies are increasingly used in various areas of medical research using a variety of different rating scales and criteria for reaching consensus. We explored the influence of using three different rating scales and different consensus criteria on the results for reaching consensus and assessed the test-retest reliability of these scales within a study aimed at identification of global treatment goals for total knee arthroplasty (TKA). METHODS: We conducted a two-stage study consisting of two surveys and consecutively included patients scheduled for TKA from five German hospitals. Patients were asked to rate 19 potential treatment goals on different rating scales (three-point, five-point, nine-point). Surveys were conducted within a 2 week period prior to TKA, order of questions (scales and treatment goals) was randomized. RESULTS: Eighty patients (mean age 68 ± 10 years; 70% females) completed both surveys. Different rating scales (three-point, five-point and nine-point rating scale) lead to different consensus despite moderate to high correlation between rating scales (r = 0.65 to 0.74). Final consensus was highly influenced by the choice of rating scale with 14 (three-point), 6 (five-point), 15 (nine-point) out of 19 treatment goals reaching the pre-defined 75% consensus threshold. The number of goals reaching consensus also highly varied between rating scales for other consensus thresholds. Overall, concordance differed between the three-point (percent agreement [p] = 88.5%, weighted kappa [k] = 0.63), five-point (p = 75.3%, k = 0.47) and nine-point scale (p = 67.8%, k = 0.78). CONCLUSION: This study provides evidence that consensus depends on the rating scale and consensus threshold within one population. The test-retest reliability of the three rating scales investigated differs substantially between individual treatment goals. This variation in reliability can become a potential source of bias in consensus studies. In our setting aimed at capturing patients' treatment goals for TKA, the three-point scale proves to be the most reasonable choice, as its translation into the clinical context is the most straightforward among the scales. Researchers conducting Delphi studies should be aware that final consensus is substantially influenced by the choice of rating scale and consensus criteria.
背景:共识导向的德尔菲研究越来越多地应用于医学研究的各个领域,使用各种不同的评分量表和达成共识的标准。我们探讨了在一项旨在确定全膝关节置换术(TKA)总体治疗目标的研究中,使用三种不同评分量表和不同共识标准对达成共识结果的影响,并评估了这些量表在研究中的测试-再测试可靠性。
方法:我们进行了一项两阶段研究,包括两项调查,并连续纳入了来自德国五家医院的计划接受 TKA 的患者。患者被要求在不同的评分量表(三点、五点、九点)上对 19 个潜在的治疗目标进行评分。调查在 TKA 前两周内进行,问题的顺序(量表和治疗目标)是随机的。
结果:80 名患者(平均年龄 68±10 岁;70%为女性)完成了两项调查。尽管评分量表之间具有中度到高度相关性(r=0.65 至 0.74),但不同的评分量表(三点、五点和九点评分量表)导致了不同的共识。最终的共识受到评分量表选择的极大影响,19 个治疗目标中有 14 个(三点)、6 个(五点)和 15 个(九点)达到了预先定义的 75%共识阈值。对于其他共识阈值,评分量表之间达到共识的目标数量也有很大差异。总体而言,三个量表之间的一致性存在差异(三点量表的百分比一致性[p]=88.5%,加权 kappa[k]=0.63),五点量表[p]=75.3%,k=0.47)和九点量表(p=67.8%,k=0.78)。
结论:本研究提供了证据表明,共识取决于同一人群中的评分量表和共识阈值。研究中调查的三种评分量表的测试-再测试可靠性在各个治疗目标之间存在显著差异。这种可靠性的变化可能成为共识研究中的一个潜在偏差源。在我们旨在捕捉 TKA 患者治疗目标的研究中,三点量表被证明是最合理的选择,因为与其他量表相比,它在临床环境中的翻译最为直接。进行德尔菲研究的研究人员应该意识到,最终共识受到评分量表和共识标准选择的极大影响。
J Orthop Surg (Hong Kong). 2022
BMC Med Inform Decis Mak. 2020-6-5
Eur J Vasc Endovasc Surg. 2017-7
BMC Med Res Methodol. 2025-9-1
J Patient Rep Outcomes. 2025-7-2
J Neurol. 2025-5-27
Trials. 2017-6-20
J Arthroplasty. 2016-4