Division of General Surgery, Department of Surgery, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; The Ottawa Hospital, Ottawa, Ontario, Canada; Department of Innovation in Medical Education (DIME), University of Ottawa, Ottawa, Ontario, Canada.
Division of General Surgery, Department of Surgery, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada; The Ottawa Hospital, Ottawa, Ontario, Canada.
J Surg Educ. 2019 Jul-Aug;76(4):1088-1093. doi: 10.1016/j.jsurg.2019.01.001. Epub 2019 Jan 29.
The inter-rater reliability (IRR) of laparoscopic skills assessment is usually determined in the context of motivated raters from a single subspecialty practice group with significant experience using similar tools. The purpose of this study was to determine the IRR among attending surgeons of different experience and practices, the extent of rater training that is necessary to achieve good IRR, and if rater training is retained over periods of nonuse.
In Part 1, 5 surgeons of different practice backgrounds assessed 3 laparoscopic cholecystectomy videos using the Global Operative Assessment of Laparoscopic Skills instrument. In Part 2, 2 of the surgeons assessed a total of 33 videos over 5 scoring sessions distributed across 6 months. They participated in 2 different training sessions, and retention was tested in the other 3 sessions. IRR was calculated for Parts 1 and 2 with an intraclass correlation (ICC) in a 2-way random-effects model.
The ICC for Part 1 was poor (ICC = 0.26). In Part 2, the ICC was highest after each training session (scoring #1 ICC = 0.76, scoring #3 ICC = 0.74). The ICC was not retained 1.5 months after the brief video-based training session (scoring #2 ICC = -0.17). The ICC was retained 2.5 months after the in-depth discussion training session (scoring #4 ICC = 0.70), but not 4.5 months later (scoring #5 ICC = 0.04).
Good IRR is not implicit among surgeons with varying backgrounds and experience. Good IRR can be achieved with different types of rater training, but the impact of rater training is lost in periods of nonuse. This suggests the need for further study of the IRR of technical skills assessment when performed by the wide variety of surgeon raters as is commonly encountered in the environment of postgraduate resident assessment.
腹腔镜技能评估的组内相关系数(IRR)通常是在具有使用类似工具的丰富经验的单一亚专业实践小组的动机评估者的背景下确定的。本研究的目的是确定不同经验和实践的主治外科医生之间的 IRR、达到良好 IRR 所需的评估者培训程度,以及评估者培训是否在非使用期间保留。
在第 1 部分中,5 名具有不同实践背景的外科医生使用全球腹腔镜技能操作评估工具评估了 3 个腹腔镜胆囊切除术视频。在第 2 部分中,其中 2 名外科医生在 6 个月的时间内共评估了 33 个视频,分为 5 次评分。他们参加了 2 次不同的培训课程,在另外 3 次课程中测试了保留情况。使用 2 种随机效应模型的组内相关系数(ICC)计算了第 1 部分和第 2 部分的 IRR。
第 1 部分的 ICC 较差(ICC=0.26)。在第 2 部分中,每次培训后 ICC 最高(评分#1 ICC=0.76,评分#3 ICC=0.74)。在简短的基于视频的培训课程 1.5 个月后,ICC 没有保留(评分#2 ICC=-0.17)。在深入讨论培训课程 2.5 个月后,ICC 保留(评分#4 ICC=0.70),但 4.5 个月后没有保留(评分#5 ICC=0.04)。
具有不同背景和经验的外科医生之间的良好 IRR 并非固有。通过不同类型的评估者培训可以实现良好的 IRR,但在非使用期间,评估者培训的效果会消失。这表明需要进一步研究在研究生住院医师评估中常见的各种外科医生评估者进行技术技能评估的 IRR。