• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

超越可靠性:使用行为标记系统时评估评分者能力

Beyond reliability: assessing rater competence when using a behavioural marker system.

作者信息

Smith Samantha Eve, McColgan-Smith Scott, Stewart Fiona, Mardon Julie, Tallentire Victoria Ruth

机构信息

Centre for Medical Education, University of Dundee, Dundee, UK.

NHS Education for Scotland, Glasgow, UK.

出版信息

Adv Simul (Lond). 2024 Dec 31;9(1):55. doi: 10.1186/s41077-024-00329-9.

DOI:10.1186/s41077-024-00329-9
PMID:39736776
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11687013/
Abstract

BACKGROUND

Behavioural marker systems are used across several healthcare disciplines to assess behavioural (non-technical) skills, but rater training is variable, and inter-rater reliability is generally poor. Inter-rater reliability provides data about the tool, but not the competence of individual raters. This study aimed to test the inter-rater reliability of a new behavioural marker system (PhaBS - pharmacists' behavioural skills) with clinically experienced faculty raters and near-peer raters. It also aimed to assess rater competence when using PhaBS after brief familiarisation, by assessing completeness, agreement with an expert rater, ability to rank performance, stringency or leniency, and avoidance of the halo effect.

METHODS

Clinically experienced faculty raters and near-peer raters attended a 30-min PhaBS familiarisation session. This was immediately followed by a marking session in which they rated a trainee pharmacist's behavioural skills in three scripted immersive acute care simulated scenarios, demonstrating good, mediocre, and poor performances respectively. Inter-rater reliability in each group was calculated using the two-way random, absolute agreement single-measures intra-class correlation co-efficient (ICC). Differences in individual rater competence in each domain were compared using Pearson's chi-squared test.

RESULTS

The ICC for experienced faculty raters was good at 0.60 (0.48-0.72) and for near-peer raters was poor at 0.38 (0.27-0.54). Of experienced faculty raters, 5/9 were competent in all domains versus 2/13 near-peer raters (difference not statistically significant). There was no statistically significant difference between the abilities of clinically experienced versus near-peer raters in agreement with an expert rater, ability to rank performance, stringency or leniency, or avoidance of the halo effect. The only statistically significant difference between groups was ability to compete the assessment (9/9 experienced faculty raters versus 6/13 near-peer raters, p = 0.0077).

CONCLUSIONS

Experienced faculty have acceptable inter-rater reliability when using PhaBS, consistent with other behaviour marker systems; however, not all raters are competent. Competence measures for other assessments can be helpfully applied to behavioural marker systems. When using behavioural marker systems for assessment, educators must start using such rater competence frameworks. This is important to ensure fair and accurate assessments for learners, to provide educators with information about rater training programmes, and to provide individual raters with meaningful feedback.

摘要

背景

行为标记系统在多个医疗保健学科中用于评估行为(非技术)技能,但评分者培训方式不一,评分者间信度通常较差。评分者间信度提供了有关工具的数据,但未涉及单个评分者的能力。本研究旨在测试一种新的行为标记系统(药剂师行为技能评估系统,PhaBS)在临床经验丰富的教师评分者和近伴评分者之间的评分者间信度。研究还旨在通过评估完整性、与专家评分者的一致性、对表现进行排名的能力、严格或宽松程度以及避免光环效应,来评估在简短熟悉之后使用PhaBS时评分者的能力。

方法

临床经验丰富的教师评分者和近伴评分者参加了为期30分钟的PhaBS熟悉课程。随后立即进行评分环节,他们在三个模拟沉浸式急性护理脚本场景中对一名实习药剂师的行为技能进行评分,这些场景分别展示了良好、中等和较差的表现。每组的评分者间信度使用双向随机、绝对一致性单测量组内相关系数(ICC)进行计算。使用Pearson卡方检验比较每个领域中单个评分者能力的差异。

结果

经验丰富的教师评分者的ICC为良好,为0.60(0.48 - 0.72),近伴评分者的ICC较差,为0.38(0.27 - 0.54)。经验丰富的教师评分者中,9人中有5人在所有领域都具备能力,而近伴评分者中13人中有2人具备能力(差异无统计学意义)。在与专家评分者的一致性、对表现进行排名的能力、严格或宽松程度以及避免光环效应方面,临床经验丰富的评分者和近伴评分者的能力之间没有统计学上的显著差异。两组之间唯一具有统计学显著差异的是完成评估的能力(9名经验丰富的教师评分者对13名近伴评分者中的6名,p = 0.0077)。

结论

经验丰富的教师在使用PhaBS时具有可接受的评分者间信度,这与其他行为标记系统一致;然而,并非所有评分者都具备能力。其他评估的能力衡量方法可有效地应用于行为标记系统。在使用行为标记系统进行评估时,教育工作者必须开始使用此类评分者能力框架。这对于确保对学习者进行公平准确的评估、为教育工作者提供有关评分者培训计划的信息以及为单个评分者提供有意义的反馈非常重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c79a/11687013/55ea58d19587/41077_2024_329_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c79a/11687013/b2ff41021ab1/41077_2024_329_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c79a/11687013/55ea58d19587/41077_2024_329_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c79a/11687013/b2ff41021ab1/41077_2024_329_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c79a/11687013/55ea58d19587/41077_2024_329_Fig2_HTML.jpg

相似文献

1
Beyond reliability: assessing rater competence when using a behavioural marker system.超越可靠性:使用行为标记系统时评估评分者能力
Adv Simul (Lond). 2024 Dec 31;9(1):55. doi: 10.1186/s41077-024-00329-9.
2
Reliability of assessment of medical students' non-technical skills using a behavioural marker system: does clinical experience matter?使用行为标记系统评估医学生非技术技能的可靠性:临床经验有影响吗?
BMJ Simul Technol Enhanc Learn. 2020 Sep 29;7(5):285-292. doi: 10.1136/bmjstel-2020-000705. eCollection 2021.
3
Intra-Rater (Live vs. Video Assessment) and Inter-Rater (Expert vs. Novice) Reliability of the Test of Gross Motor Development-Third Edition.《运动发育测试第三版》的观察者内信度(现场评估与视频评估)和观察者间信度(专家与新手评估)。
Int J Environ Res Public Health. 2021 Feb 9;18(4):1652. doi: 10.3390/ijerph18041652.
4
Scoring reading parameters: An inter-rater reliability study using the MNREAD chart.评分阅读参数:使用 MNREAD 图表的观察者间信度研究。
PLoS One. 2019 Jun 7;14(6):e0216775. doi: 10.1371/journal.pone.0216775. eCollection 2019.
5
Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.培训经验不足的教员可提高心脏外科手术技能评估的可靠性。
J Thorac Cardiovasc Surg. 2014 Dec;148(6):2491-6.e1-2. doi: 10.1016/j.jtcvs.2014.09.017. Epub 2014 Sep 16.
6
Inter- and Intra-Rater Reliabilities of the Army Combat Fitness Test Three-Repetition Maximum Deadlift Event Among Raters of Varying Professional Experience.不同专业经验评定者评估陆军战斗体能测试 3 次最大重复深蹲事件的组内和组间可靠性。
Mil Med. 2023 Aug 29;188(9-10):3079-3085. doi: 10.1093/milmed/usac099.
7
Inter-rater and test-retest reliability of quality assessments by novice student raters using the Jadad and Newcastle-Ottawa Scales.新手学生评分者使用Jadad量表和纽卡斯尔-渥太华量表进行质量评估的评分者间信度和重测信度。
BMJ Open. 2012 Jul 31;2(4). doi: 10.1136/bmjopen-2012-001368. Print 2012.
8
Collecting evidence of validity for an assessment tool for Norwegian medical students' non-technical skills (NorMS-NTS): usability and reliability when used by novice raters.收集挪威医学生非技术技能评估工具(NorMS-NTS)的有效性证据:新手评估者使用时的可用性和可靠性。
BMC Med Educ. 2023 Nov 15;23(1):865. doi: 10.1186/s12909-023-04837-6.
9
Reproducibility of methodological radiomics score (METRICS): an intra- and inter-rater reliability study endorsed by EuSoMII.方法学影像组学评分(METRICS)的可重复性:一项由欧洲医学影像信息学会(EuSoMII)认可的评分者内和评分者间可靠性研究。
Eur Radiol. 2025 Feb 19. doi: 10.1007/s00330-025-11443-1.
10
Effects of a rater training on rating accuracy in a physical examination skills assessment.评分员培训对体格检查技能评估中评分准确性的影响。
GMS Z Med Ausbild. 2014 Nov 17;31(4):Doc41. doi: 10.3205/zma000933. eCollection 2014.

本文引用的文献

1
The cognitive processes employed by undergraduate nursing OSCE assessors: A qualitative research study.本科护理客观结构化临床考试评估者所采用的认知过程:一项定性研究
Nurse Educ Today. 2024 Mar;134:106083. doi: 10.1016/j.nedt.2023.106083. Epub 2023 Dec 20.
2
The development of a marker system for Pharmacists' Behavioural Skills.药剂师行为技能标记系统的开发。
Int J Pharm Pract. 2023 Sep 30;31(5):520-527. doi: 10.1093/ijpp/riad041.
3
Reliability of assessment of medical students' non-technical skills using a behavioural marker system: does clinical experience matter?
使用行为标记系统评估医学生非技术技能的可靠性:临床经验有影响吗?
BMJ Simul Technol Enhanc Learn. 2020 Sep 29;7(5):285-292. doi: 10.1136/bmjstel-2020-000705. eCollection 2021.
4
Medical Students' Non-Technical Skills (Medi-StuNTS): preliminary work developing a behavioural marker system for the non-technical skills of medical students in acute care.医学生非技术技能(Medi-StuNTS):开发急性护理中医学生非技术技能行为标记系统的初步工作。
BMJ Simul Technol Enhanc Learn. 2018 Jun 1;5(3):130-139. doi: 10.1136/bmjstel-2018-000310. eCollection 2019.
5
OSCE rater cognition - an international multi-centre qualitative study.客观结构化临床考试评分者认知:一项国际多中心定性研究。
BMC Med Educ. 2022 Jan 3;22(1):6. doi: 10.1186/s12909-021-03077-w.
6
Cognitive load theory: Implications for assessment in pharmacy education.认知负荷理论:对药学教育评估的启示。
Res Social Adm Pharm. 2021 Sep;17(9):1645-1649. doi: 10.1016/j.sapharm.2020.12.009. Epub 2020 Dec 22.
7
Exploring transformative learning when developing medical students' non-technical skills.探索医学学生非技术技能发展中的变革性学习。
Med Educ. 2020 Mar;54(3):264-274. doi: 10.1111/medu.14062.
8
Optimizing assessors' mental workload in rater-based assessment: a critical narrative review.基于评估者的评分评估中评估者心理工作量的优化:批判性叙事评论。
Perspect Med Educ. 2019 Dec;8(6):339-345. doi: 10.1007/s40037-019-00535-6.
9
Observer roles that optimise learning in healthcare simulation education: a systematic review.优化医疗模拟教育中学习效果的观察者角色:一项系统综述
Adv Simul (Lond). 2016 Jan 11;1:4. doi: 10.1186/s41077-015-0004-8. eCollection 2016.
10
How faculty members experience workplace-based assessment rater training: a qualitative study.教师如何体验基于工作场所的评估评分员培训:一项定性研究。
Med Educ. 2015 Jul;49(7):692-708. doi: 10.1111/medu.12733.