基于观察的手术技能结果评估方法的可靠性：系统评价和荟萃分析。

Reliability of Observational Assessment Methods for Outcome-based Assessment of Surgical Skill: Systematic Review and Meta-analyses.

机构信息

Technical Medical Centre, University of Twente, Enschede, The Netherlands.

出版信息

J Surg Educ. 2020 Jan-Feb;77(1):189-201. doi: 10.1016/j.jsurg.2019.07.007. Epub 2019 Aug 20.

DOI:10.1016/j.jsurg.2019.07.007

PMID:31444148

Abstract

BACKGROUND

Reliable performance assessment is a necessary prerequisite for outcome-based assessment of surgical technical skill. Numerous observational instruments for technical skill assessment have been developed in recent years. However, methodological shortcomings of reported studies might negatively impinge on the interpretation of inter-rater reliability.

OBJECTIVE

To synthesize the evidence about the inter-rater reliability of observational instruments for technical skill assessment for high-stakes decisions.

DESIGN

A systematic review and meta-analysis were performed. We searched Scopus (including MEDLINE) and Pubmed, and key publications through December, 2016. This included original studies that evaluated reliability of instruments for the observational assessment of technical skills. Two reviewers independently extracted information on the primary outcome (the reliability statistic), secondary outcomes, and general information. We calculated pooled estimates using multilevel random effects meta-analyses where appropriate.

RESULTS

A total of 247 documents met our inclusion criteria and provided 491 inter-rater reliability estimates. Inappropriate inter-rater reliability indices were reported for 40% of the checklists estimates, 50% of the rating scales estimates and 41% of the other types of assessment instruments estimates. Only 14 documents provided sufficient information to be included in the meta-analyses. The pooled Cohen's kappa was .78 (95% CI 0.69-0.89, p < 0.001) and pooled proportion agreement was 0.84 (95% CI 0.71-0.96, p < 0.001). A moderator analysis was performed to explore the influence of type of assessment instrument as a possible source of heterogeneity.

CONCLUSIONS AND RELEVANCE

For high-stakes decisions, there was often insufficient information available on which to base conclusions. The use of suboptimal statistical methods and incomplete reporting of reliability estimates does not support the use of observational assessment instruments for technical skill for high-stakes decisions. Interpretations of inter-rater reliability should consider the reliability index and assessment instrument used. Reporting of inter-rater reliability needs to be improved by detailed descriptions of the assessment process.

摘要

背景

可靠的性能评估是基于结果的手术技术技能评估的必要前提。近年来，已经开发出许多用于技术技能评估的观察仪器。然而，报告研究的方法学缺陷可能会对评分者间可靠性的解释产生负面影响。

目的

综合高风险决策中技术技能评估的观察仪器评分者间可靠性的证据。

设计

系统回顾和荟萃分析。我们检索了 Scopus（包括 MEDLINE）和 Pubmed，并通过 2016 年 12 月的关键出版物进行了搜索。这包括评估技术技能观察评估仪器可靠性的原始研究。两名审查员独立提取主要结果（可靠性统计数据）、次要结果和一般信息。在适当的情况下，我们使用多级随机效应荟萃分析计算了汇总估计值。

结果

共有 247 篇文献符合纳入标准，并提供了 491 个评分者间可靠性估计值。40%的检查表评估估计值、50%的评分量表评估估计值和 41%的其他类型评估仪器评估估计值报告了不适当的评分者间可靠性指标。只有 14 篇文献提供了足够的信息进行荟萃分析。汇总的 Cohen's kappa 为.78（95% CI 0.69-0.89，p < 0.001），汇总的一致性比例为 0.84（95% CI 0.71-0.96，p < 0.001）。进行了一项调节分析，以探讨评估仪器类型作为异质性可能来源的影响。

结论和相关性

对于高风险决策，往往没有足够的信息来得出结论。观察评估仪器的技术技能的使用不理想的统计方法和不完整的可靠性估计报告并不支持高风险决策的使用。评分者间可靠性的解释应考虑使用的可靠性指标和评估仪器。通过详细描述评估过程，可以提高评分者间可靠性的报告。

相似文献

Reliability of Observational Assessment Methods for Outcome-based Assessment of Surgical Skill: Systematic Review and Meta-analyses.基于观察的手术技能结果评估方法的可靠性：系统评价和荟萃分析。

J Surg Educ. 2020 Jan-Feb;77(1):189-201. doi: 10.1016/j.jsurg.2019.07.007. Epub 2019 Aug 20.

A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment.基于模拟评估中检查表与整体评分量表有效性证据的系统评价。

Med Educ. 2015 Feb;49(2):161-73. doi: 10.1111/medu.12621.

Use of Generalizability Theory for Exploring Reliability of and Sources of Variance in Assessment of Technical Skills: A Systematic Review and Meta-Analysis.运用概化理论探究技术技能评估中变异性的可靠性和来源：系统评价和荟萃分析。

Acad Med. 2021 Nov 1;96(11):1609-1619. doi: 10.1097/ACM.0000000000004150.

Inter-rater agreement and reliability of outcome measurement instruments and staging systems used in hidradenitis suppurativa.化脓性汗腺炎结局评估工具和分期系统的评价者间信度和可靠性。

Br J Dermatol. 2019 Sep;181(3):483-491. doi: 10.1111/bjd.17716. Epub 2019 Jun 6.

Inter-rater agreement in evaluation of disability: systematic review of reproducibility studies.残疾评估中的评分者间一致性：再现性研究的系统评价

BMJ. 2017 Jan 25;356:j14. doi: 10.1136/bmj.j14.

Development and evaluation of the General Surgery Objective Structured Assessment of Technical Skill (GOSATS).普通外科客观结构化技能评估（GOSATS）的制定与评估。

Br J Surg. 2019 Nov;106(12):1617-1622. doi: 10.1002/bjs.11359. Epub 2019 Oct 6.

Global Rating Scales for the Assessment of Arthroscopic Surgical Skills: A Systematic Review.关节镜手术技能评估的全球评分量表：系统评价。

Arthroscopy. 2020 Apr;36(4):1156-1173. doi: 10.1016/j.arthro.2019.09.025. Epub 2020 Jan 14.

Endoscopic scoring indices for evaluation of disease activity in ulcerative colitis.用于评估溃疡性结肠炎疾病活动度的内镜评分指数。

Cochrane Database Syst Rev. 2018 Jan 16;1(1):CD011450. doi: 10.1002/14651858.CD011450.pub2.

Development of a cast application simulator and evaluation of objective measures of performance.铸造敷贴器的研制及性能客观测量的评估。

J Bone Joint Surg Am. 2014 May 7;96(9):e76. doi: 10.2106/JBJS.L.01266.

High-stakes assessment of the non-technical skills of critical care trainees using simulation: feasibility, acceptability and reliability.使用模拟技术对重症监护培训生的非技术技能进行高风险评估：可行性、可接受性和可靠性。

Crit Care Resusc. 2014 Mar;16(1):6-12.

引用本文的文献

Construction of online classroom instructional quality assessment system of university music based on BP neural network.基于BP神经网络的高校音乐在线课堂教学质量评估系统构建

Sci Rep. 2025 Apr 24;15(1):14250. doi: 10.1038/s41598-025-98556-1.

Surgical quality assessment of critical view of safety in 283 laparoscopic cholecystectomy videos by surgical residents and surgeons.手术学员和外科医生对 283 段腹腔镜胆囊切除术视频中关键安全视野的手术质量评估。

Surg Endosc. 2024 Jul;38(7):3609-3614. doi: 10.1007/s00464-024-10873-0. Epub 2024 May 20.

What are the learning objectives in surgical training - a systematic literature review of the surgical competence framework.外科培训的学习目标是什么——外科能力框架的系统文献回顾。

BMC Med Educ. 2024 Feb 6;24(1):119. doi: 10.1186/s12909-024-05068-z.

Nationwide standardization of minimally invasive right hemicolectomy for colon cancer and development and validation of a video-based competency assessment tool (the Right study).全国范围内微创右半结肠癌切除术的标准化及基于视频的能力评估工具的开发和验证（RIGHT 研究）。

Br J Surg. 2024 Jan 3;111(1). doi: 10.1093/bjs/znad404.

Development and Evaluation of a Proficiency-based and Simulation-based Surgical Skills Training for Technical Medicine Students.针对医学技术专业学生的基于熟练度和模拟的外科技能培训的开发与评估

MedEdPublish (2016). 2020 Dec 17;9:284. doi: 10.15694/mep.2020.000284.1. eCollection 2020.

Using video-based assessment (VBA) to document fellow improvement in safely completing the jejunojejunostomy portion of laparoscopic Roux-en-Y gastric bypass (RYGB) surgery.使用基于视频的评估（VBA）来记录同伴在安全完成腹腔镜 Roux-en-Y 胃旁路（RYGB）手术中的空肠空肠吻合部分的改进情况。

Surg Endosc. 2023 Nov;37(11):8853-8860. doi: 10.1007/s00464-023-10425-y. Epub 2023 Sep 27.

Effectiveness of Flexible Bronchoscopy Simulation-Based Training: A Systematic Review.基于柔性支气管镜模拟训练的效果：系统评价。

Chest. 2023 Oct;164(4):952-962. doi: 10.1016/j.chest.2023.05.012. Epub 2023 May 12.

Validation study of a skill assessment tool for education and outcome prediction of laparoscopic distal gastrectomy.腹腔镜远端胃切除术技能评估工具的验证研究：教育和结果预测

Surg Endosc. 2022 Dec;36(12):8807-8816. doi: 10.1007/s00464-022-09305-8. Epub 2022 May 16.

A Spatial Regression Analysis on the Effect of Neighborhood-Level Trust on Cooperative Behaviors: Comparison With a Multilevel Regression Analysis.邻里层面信任对合作行为影响的空间回归分析：与多层次回归分析的比较

Front Psychol. 2019 Dec 19;10:2799. doi: 10.3389/fpsyg.2019.02799. eCollection 2019.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于观察的手术技能结果评估方法的可靠性：系统评价和荟萃分析。

Reliability of Observational Assessment Methods for Outcome-based Assessment of Surgical Skill: Systematic Review and Meta-analyses.

机构信息

出版信息

BACKGROUND

OBJECTIVE

DESIGN

RESULTS

CONCLUSIONS AND RELEVANCE

背景

目的

设计

结果

结论和相关性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献