运用概化理论探究技术技能评估中变异性的可靠性和来源：系统评价和荟萃分析。

Use of Generalizability Theory for Exploring Reliability of and Sources of Variance in Assessment of Technical Skills: A Systematic Review and Meta-Analysis.

机构信息

S.A.W. Andersen is postdoctoral researcher, Copenhagen Academy for Medical Education and Simulation (CAMES), Center for Human Resources and Education, Capital Region of Denmark, and Department of Otolaryngology, The Ohio State University, Columbus, Ohio, and resident in otorhinolaryngology, Department of Otorhinolaryngology-Head & Neck Surgery, Rigshospitalet, Copenhagen, Denmark; ORCID: https://orcid.org/0000-0002-3491-9790 .

L.J. Nayahangan is researcher, CAMES, Center for Human Resources and Education, Capital Region of Denmark, Copenhagen, Denmark; ORCID: https://orcid.org/0000-0002-6179-1622 .

出版信息

Acad Med. 2021 Nov 1;96(11):1609-1619. doi: 10.1097/ACM.0000000000004150.

DOI:10.1097/ACM.0000000000004150

PMID:33951677

Abstract

PURPOSE

Competency-based education relies on the validity and reliability of assessment scores. Generalizability (G) theory is well suited to explore the reliability of assessment tools in medical education but has only been applied to a limited extent. This study aimed to systematically review the literature using G-theory to explore the reliability of structured assessment of medical and surgical technical skills and to assess the relative contributions of different factors to variance.

METHOD

In June 2020, 11 databases, including PubMed, were searched from inception through May 31, 2020. Eligible studies included the use of G-theory to explore reliability in the context of assessment of medical and surgical technical skills. Descriptive information on study, assessment context, assessment protocol, participants being assessed, and G-analyses was extracted. Data were used to map G-theory and explore variance components analyses. A meta-analysis was conducted to synthesize the extracted data on the sources of variance and reliability.

RESULTS

Forty-four studies were included; of these, 39 had sufficient data for meta-analysis. The total pool included 35,284 unique assessments of 31,496 unique performances of 4,154 participants. Person variance had a pooled effect of 44.2% (95% confidence interval [CI], 36.8%-51.5%). Only assessment tool type (Objective Structured Assessment of Technical Skills-type vs task-based checklist-type) had a significant effect on person variance. The pooled reliability (G-coefficient) was 0.65 (95% CI, .59-.70). Most studies included decision studies (39, 88.6%) and generally seemed to have higher ratios of performances to assessors to achieve a sufficiently reliable assessment.

CONCLUSIONS

G-theory is increasingly being used to examine reliability of technical skills assessment in medical education, but more rigor in reporting is warranted. Contextual factors can potentially affect variance components and thereby reliability estimates and should be considered, especially in high-stakes assessment. Reliability analysis should be a best practice when developing assessment of technical skills.

摘要

目的

基于能力的教育依赖于评估分数的有效性和可靠性。广义理论（G 理论）非常适合探索医学教育中评估工具的可靠性，但仅在有限的范围内得到了应用。本研究旨在使用 G 理论系统地回顾文献，以探讨医学和外科技术技能结构化评估的可靠性，并评估不同因素对变异的相对贡献。

方法

2020 年 6 月，从建库到 2020 年 5 月 31 日，通过 11 个数据库（包括 PubMed）进行检索。符合条件的研究包括使用 G 理论探索评估医学和外科技术技能背景下的可靠性，并提取研究、评估背景、评估方案、评估对象和 G 分析的描述性信息。使用数据进行 G 理论映射并探索方差分量分析。对提取的关于变异和可靠性来源的数据进行荟萃分析。

结果

共纳入 44 项研究；其中 39 项有足够的数据进行荟萃分析。总共有 35284 项独特评估的 31496 项独特表现的 4154 名参与者。个体方差的总效应为 44.2%（95%置信区间[CI]，36.8%-51.5%）。只有评估工具类型（客观结构化技能评估型与任务型检查表型）对个体方差有显著影响。综合可靠性（G 系数）为 0.65（95% CI，0.59-0.70）。大多数研究包括决策研究（39 项，88.6%），并且通常似乎具有更高的表现与评估者比例，以实现足够可靠的评估。

结论

G 理论越来越多地用于检验医学教育中技术技能评估的可靠性，但需要更严格地报告。上下文因素可能会潜在影响变异分量，从而影响可靠性估计，因此应予以考虑，尤其是在高风险评估中。在开发技术技能评估时，可靠性分析应成为最佳实践。

相似文献

Use of Generalizability Theory for Exploring Reliability of and Sources of Variance in Assessment of Technical Skills: A Systematic Review and Meta-Analysis.运用概化理论探究技术技能评估中变异性的可靠性和来源：系统评价和荟萃分析。

Acad Med. 2021 Nov 1;96(11):1609-1619. doi: 10.1097/ACM.0000000000004150.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Assessing the surgical skills of trainees in the operating theatre: a prospective observational study of the methodology.评估手术室受训者的手术技能：一种前瞻性观察研究方法。

Health Technol Assess. 2011 Jan;15(1):i-xxi, 1-162. doi: 10.3310/hta15010.

Reliability of Observational Assessment Methods for Outcome-based Assessment of Surgical Skill: Systematic Review and Meta-analyses.基于观察的手术技能结果评估方法的可靠性：系统评价和荟萃分析。

J Surg Educ. 2020 Jan-Feb;77(1):189-201. doi: 10.1016/j.jsurg.2019.07.007. Epub 2019 Aug 20.

Reliability analysis of the objective structured clinical examination using generalizability theory.基于概化理论的客观结构化临床考试信度分析

Med Educ Online. 2016 Aug 18;21:31650. doi: 10.3402/meo.v21.31650. eCollection 2016.

Supervisor assessment of clinical and professional competence of medical trainees: a reliability study using workplace data and a focused analytical literature review.督导评估医学受训者的临床和专业能力：使用工作场所数据和重点分析文献回顾的可靠性研究。

Adv Health Sci Educ Theory Pract. 2011 Aug;16(3):405-25. doi: 10.1007/s10459-011-9296-1. Epub 2011 May 24.

A systematic review of the reliability of objective structured clinical examination scores.客观结构化临床考试成绩可靠性的系统评价。

Med Educ. 2011 Dec;45(12):1181-9. doi: 10.1111/j.1365-2923.2011.04075.x. Epub 2011 Oct 11.

Clinical assessment of transthoracic echocardiography skills: a generalizability study.经胸超声心动图技能的临床评估：一项可推广性研究。

BMC Med Educ. 2015 Feb 1;15:9. doi: 10.1186/s12909-015-0294-5.

Direct observation of procedural skills (DOPS) assessment in diagnostic gastroscopy: nationwide evidence of validity and competency development during training.诊断性胃镜操作技能的直接观察评估（DOPS）：全国范围内培训期间有效性和能力发展的证据。

Surg Endosc. 2020 Jan;34(1):105-114. doi: 10.1007/s00464-019-06737-7. Epub 2019 Mar 25.

Reliable Assessment of Surgical Technical Skills Is Dependent on Context: An Exploration of Different Variables Using Generalizability Theory.手术技术技能的可靠评估取决于情境：使用概化理论探索不同变量。

Acad Med. 2020 Dec;95(12):1929-1936. doi: 10.1097/ACM.0000000000003550.

引用本文的文献

Applying generalized theory to optimize the quality of high-stakes objective structured clinical examinations for undergraduate medical students: experience from the French medical school.应用通用理论优化本科医学生高风险客观结构化临床考试的质量：来自法国医学院的经验

BMC Med Educ. 2025 May 2;25(1):643. doi: 10.1186/s12909-025-07255-y.

Evaluating the dependability of peer assessment in project-based learning for pre-clinical students: a generalizability theory approach.评估临床前学生基于项目的学习中同伴评估的可靠性：一种概化理论方法。

BMC Med Educ. 2025 Feb 18;25(1):260. doi: 10.1186/s12909-025-06772-0.

Evaluation of Peer Review of Percutaneous Coronary Intervention Operator Performance.经皮冠状动脉介入治疗术操作者表现的同行评审评估

Circ Cardiovasc Qual Outcomes. 2025 Jan;18(1):e011159. doi: 10.1161/CIRCOUTCOMES.124.011159. Epub 2025 Jan 3.

A meta-analytic evaluation of the reliability of work-family and family-work conflict scales.工作-家庭与家庭-工作冲突量表可靠性的元分析评估

Sci Rep. 2024 Dec 30;14(1):31828. doi: 10.1038/s41598-024-83086-z.

There is no "too small" for frequent workplace-based assessment: Differences between large and small residency programs in anesthesia when using a mobile application to assess EPAs.对于基于工作场所的频繁评估而言，不存在“规模过小”的问题：在使用移动应用程序评估可托付专业活动（EPA）时，麻醉专业大、小住院医师培训项目之间的差异

GMS J Med Educ. 2024 Nov 15;41(5):Doc54. doi: 10.3205/zma001709. eCollection 2024.

Reliability of a workplace-based assessment for the United States general surgical trainees’ intraoperative performance using multivariate generalizability theory: a psychometric study.使用多变量概化理论评估美国普通外科住院医师手术室内表现的基于工作场所评估的可靠性：一项心理测量学研究。

J Educ Eval Health Prof. 2024;21:26. doi: 10.3352/jeehp.2024.21.26. Epub 2024 Sep 24.

Development and validation of a simulation-based assessment of operative competence for higher specialist trainees in general surgery.开发并验证一种基于模拟的普通外科高专科住院医师手术能力评估方法。

Surg Endosc. 2024 Sep;38(9):5086-5095. doi: 10.1007/s00464-024-11024-1. Epub 2024 Jul 17.

Development of peer assessment rubrics in simulation-based learning for advanced cardiac life support skills among medical students.医学生高级心脏生命支持技能模拟学习中同伴评估量表的开发。

Adv Simul (Lond). 2024 Jun 24;9(1):25. doi: 10.1186/s41077-024-00301-7.

Development and validation of immediate self-feedback very short answer questions for medical students: practical implementation of generalizability theory to estimate reliability in formative examination designs.发展和验证医学生即时自我反馈简答题：应用概化理论估计形成性考试设计中的可靠性的实际操作。

BMC Med Educ. 2024 May 24;24(1):572. doi: 10.1186/s12909-024-05569-x.

Exploring the measurement of psychological resilience in Chinese civil aviation pilots based on generalizability theory and item response theory.基于概化理论和项目反应理论探索中国民航飞行员心理弹性的测量。

Sci Rep. 2024 Jan 22;14(1):1856. doi: 10.1038/s41598-024-52229-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

运用概化理论探究技术技能评估中变异性的可靠性和来源：系统评价和荟萃分析。

Use of Generalizability Theory for Exploring Reliability of and Sources of Variance in Assessment of Technical Skills: A Systematic Review and Meta-Analysis.

机构信息

出版信息

PURPOSE

METHOD

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献