ProOrtho Clinic Kirkland, Washington, U.S.A..
Desert Orthopedic Center, Palm Desert, California, U.S.A.
Arthroscopy. 2021 Nov;37(11):3241-3247. doi: 10.1016/j.arthro.2021.04.060. Epub 2021 May 5.
The purpose of our study was to compare real-time, live observational scoring with delayed retrospective video review of operative performance and to determine whether the evaluation method affected the attainment of proficiency benchmarks.
Sixteen arthroscopy/sports medicine fellows and 2 senior residents completed training to perform arthroscopic Bankart repairs (ABRs) and arthroscopic rotator cuff repairs (ARCRs) using a proficiency-based progression curriculum. Each final operative performance for 15 randomly selected ABRs and 13 ARCRs performed on cadavers were scored live (observation during the operative performance) and on delayed video review (6-8 weeks) by 1 of 15 trained raters using validated metric-based (step and error) assessment tools. The inter-rater reliability (IRR) of live versus video review by a single rater was calculated, and changes to the trainee's attainment of the proficiency benchmarks were noted. The correlation coefficient (r) and the R were also calculated for the paired scores from the randomly selected performances.
No significant differences in the observed IRR agreement or the attainment of the proficiency benchmarks were found when comparing live to video assessment for either ABR or ARCR. The correlation coefficients r and R were considerably lower than the agreement coefficient (IRR) for rotator cuff steps (e.g., R = 0.74 vs. IRR = 0.97, P = 0.001); Bankart errors (R = 0.73 vs. IRR = 0.98, P = 0.006); and rotator cuff errors (R = 0.48 vs. IRR = 0.98, P = 0.0002).
Real-time live and delayed video-based scoring of operative performance are essentially equivalent for the metric-based assessments of operative performance in ABRs and ARCRs. When the IRR agreement coefficient was compared with the correlation coefficients, the former was found to have greater homogeneity and measurement precision.
Metric-based live scoring is reliable and accurate for operative performance assessment, including high-stakes evaluations.
本研究旨在比较实时现场观察评分与手术操作表现的延迟回顾性视频评估,并确定评估方法是否会影响熟练程度基准的达成。
16 名关节镜/运动医学研究员和 2 名高级住院医师完成了使用基于熟练程度的进展课程进行关节镜下 Bankart 修复术(ABR)和关节镜下肩袖修复术(ARCR)的培训。15 名经过培训的评估者中的 1 名对 15 个随机选择的 ABR 和 13 个 ARCR 的所有最终手术表现进行了实时(在手术过程中进行观察)和延迟视频回顾(6-8 周)评分,使用基于验证的基于度量(步骤和错误)的评估工具。计算了单个评估者的实时评分与视频评分之间的组内相关系数(IRR),并注意到学员达到熟练程度基准的变化。还计算了随机选择表现的配对评分的相关系数(r)和 R。
对于 ABR 或 ARCR,实时评估与视频评估相比,观察到的 IRR 一致性或熟练程度基准的达成均无显着差异。对于肩袖步骤(例如,R=0.74 与 IRR=0.97,P=0.001)、Bankart 错误(R=0.73 与 IRR=0.98,P=0.006)和肩袖错误(R=0.48 与 IRR=0.98,P=0.0002),相关系数 r 和 R 均远低于一致性系数(IRR)。
实时现场和基于延迟视频的手术表现评分对于 ABR 和 ARCR 的基于度量的手术表现评估基本等效。当将 IRR 一致性系数与相关系数进行比较时,前者具有更大的同质性和测量精度。
基于度量的实时评分对于手术表现评估是可靠且准确的,包括高风险评估。