• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于评估者的评估中的不一致主要影响处于边缘的候选人:但是使用简单的启发式方法可能会改善通过/不通过的决策。

Inconsistencies in rater-based assessments mainly affect borderline candidates: but using simple heuristics might improve pass-fail decisions.

机构信息

Centre for Health Sciences Education, Faculty of Medicine, University of Oslo, Oslo, Norway.

Centre for Educational Measurement (CEMO), Faculty of Educational Sciences, University of Oslo, Oslo, Norway.

出版信息

Adv Health Sci Educ Theory Pract. 2024 Nov;29(5):1749-1767. doi: 10.1007/s10459-024-10328-0. Epub 2024 Apr 23.

DOI:10.1007/s10459-024-10328-0
PMID:38649529
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11549209/
Abstract

INTRODUCTION

Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that inconsistencies in ratings depend on the level of performance a to-be-evaluated candidate shows. This possibility has not been addressed deliberately and with appropriate statistical methods. By adopting the theoretical lens of ecological rationality, we evaluate if easily implementable strategies can enhance decision making in real-world assessment contexts.

METHODS

We address two objectives. First, we investigate the dependence of rater-consistency on performance levels. We recorded videos of mock-exams and had examiners (N=10) evaluate four students' performances and compare inconsistencies in performance ratings between examiner-pairs using a bootstrapping procedure. Our second objective is to provide an approach that aids decision making by implementing simple heuristics.

RESULTS

We found that discrepancies were largely a function of the level of performance the candidates showed. Lower performances were rated more inconsistently than excellent performances. Furthermore, our analyses indicated that the use of simple heuristics might improve decisions in examiner pairs.

DISCUSSION

Inconsistencies in performance judgments continue to be a matter of concern, and we provide empirical evidence for them to be related to candidate performance. We discuss implications for research and the advantages of adopting the perspective of ecological rationality. We point to directions both for further research and for development of assessment practices.

摘要

简介

各个领域的研究表明,专家判断可能高度不一致。然而,在许多情况下,专家判断是不可或缺的。在医学教育中,专家通常作为基于评分者的评估中的考官。在这里,考官之间的意见分歧可能会产生深远的影响。文献表明,评分的不一致性取决于待评估候选人的表现水平。这种可能性尚未被有意地、用适当的统计方法来解决。通过采用生态理性的理论视角,我们评估了在现实评估情境中,是否可以采用易于实施的策略来增强决策。

方法

我们旨在实现两个目标。首先,我们调查了评分者一致性对表现水平的依赖性。我们记录了模拟考试的视频,并让考官(N=10)评估四名学生的表现,并使用自举程序比较考官对表现评分的不一致性。我们的第二个目标是提供一种方法,通过实施简单的启发式方法来辅助决策。

结果

我们发现,差异主要是候选人表现水平的函数。表现较差的被评为更不一致,而表现出色的则不然。此外,我们的分析表明,使用简单的启发式方法可能会改善考官对决策的影响。

讨论

绩效判断的不一致性仍然是一个令人关注的问题,我们提供了实证证据表明它们与候选人的表现有关。我们讨论了对研究和采用生态理性视角的意义。我们指出了进一步研究和评估实践发展的方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5da/11549209/2ab296aeab69/10459_2024_10328_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5da/11549209/2ab296aeab69/10459_2024_10328_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5da/11549209/2ab296aeab69/10459_2024_10328_Fig2_HTML.jpg

相似文献

1
Inconsistencies in rater-based assessments mainly affect borderline candidates: but using simple heuristics might improve pass-fail decisions.基于评估者的评估中的不一致主要影响处于边缘的候选人:但是使用简单的启发式方法可能会改善通过/不通过的决策。
Adv Health Sci Educ Theory Pract. 2024 Nov;29(5):1749-1767. doi: 10.1007/s10459-024-10328-0. Epub 2024 Apr 23.
2
Can physician examiners overcome their first impression when examinee performance changes?医师检查者能否克服第一印象,当受检者表现改变时?
Adv Health Sci Educ Theory Pract. 2018 Oct;23(4):721-732. doi: 10.1007/s10459-018-9823-4. Epub 2018 Mar 20.
3
Exploring the role of first impressions in rater-based assessments.探索第一印象在基于评分者的评估中的作用。
Adv Health Sci Educ Theory Pract. 2014 Aug;19(3):409-27. doi: 10.1007/s10459-013-9453-9. Epub 2013 Mar 26.
4
Inter-rater variability as mutual disagreement: identifying raters' divergent points of view.组内评分者间变异性作为相互不一致性:确定评分者的不同观点。
Adv Health Sci Educ Theory Pract. 2017 Oct;22(4):819-838. doi: 10.1007/s10459-016-9711-8. Epub 2016 Sep 20.
5
The influence of first impressions on subsequent ratings within an OSCE station.第一印象对 OSCE 站后续评分的影响。
Adv Health Sci Educ Theory Pract. 2017 Oct;22(4):969-983. doi: 10.1007/s10459-016-9736-z. Epub 2016 Nov 15.
6
A method for identifying extreme OSCE examiners.一种识别极端客观结构化临床考试考官的方法。
Clin Teach. 2013 Feb;10(1):27-31. doi: 10.1111/j.1743-498X.2012.00607.x.
7
Inter-rater reliability in clinical assessments: do examiner pairings influence candidate ratings?临床评估中的评分者间信度: examiner pairings 是否会影响考生评分?
BMC Med Educ. 2020 May 11;20(1):147. doi: 10.1186/s12909-020-02009-4.
8
Standardized examinees: development of a new tool to evaluate factors influencing OSCE scores and to train examiners.标准化考生:开发一种新工具,以评估影响客观结构化临床考试分数的因素并培训考官。
GMS J Med Educ. 2020 Jun 15;37(4):Doc40. doi: 10.3205/zma001333. eCollection 2020.
9
"On the same page"? The effect of GP examiner feedback on differences in rating severity in clinical assessments: a pre/post intervention study.“在同一页上”?GP 考官反馈对临床评估中评分严重程度差异的影响:一项干预前后研究。
BMC Med Educ. 2017 Jun 6;17(1):101. doi: 10.1186/s12909-017-0929-9.
10
From aggregation to interpretation: how assessors judge complex data in a competency-based portfolio.从聚集到解释:评估者如何在基于能力的组合中判断复杂数据。
Adv Health Sci Educ Theory Pract. 2018 May;23(2):275-287. doi: 10.1007/s10459-017-9793-y. Epub 2017 Oct 14.

本文引用的文献

1
The most consistent finding in forensic science is inconsistency.法医学中最一致的发现就是不一致性。
J Forensic Sci. 2023 Nov;68(6):1851-1855. doi: 10.1111/1556-4029.15369. Epub 2023 Sep 2.
2
"Rater training" re-imagined for work-based assessment in medical education.工作场所评估中的医学生评价培训。
Adv Health Sci Educ Theory Pract. 2023 Dec;28(5):1697-1709. doi: 10.1007/s10459-023-10237-8. Epub 2023 May 4.
3
How experts' own inconsistency relates to their confidence and between-expert disagreement.专家自身的不一致性如何与其信心相关,以及专家之间的意见分歧。
Sci Rep. 2022 Jun 3;12(1):9273. doi: 10.1038/s41598-022-12847-5.
4
Pass/fail decisions and standards: the impact of differential examiner stringency on OSCE outcomes.通过/失败决策和标准:不同主考人严格程度对客观结构化临床考试结果的影响。
Adv Health Sci Educ Theory Pract. 2022 May;27(2):457-473. doi: 10.1007/s10459-022-10096-9. Epub 2022 Mar 1.
5
Building reliable and generalizable clerkship competency assessments: Impact of 'hawk-dove' correction.构建可靠且可推广的临床实习能力评估:“鹰鸽”校正的影响
Med Teach. 2021 Dec;43(12):1374-1380. doi: 10.1080/0142159X.2021.1948519. Epub 2021 Sep 17.
6
The influence of candidates' physical attributes on assessors' ratings in clinical practice.候选人的身体特征对临床评估者评分的影响。
Med Teach. 2021 May;43(5):554-559. doi: 10.1080/0142159X.2021.1877268. Epub 2021 Feb 11.
7
Conjunctive standards in OSCEs: The why and the how of number of stations passed criteria.客观结构化临床考试中的联结标准:通过标准的站点数的原因和方法。
Med Teach. 2021 Apr;43(4):448-455. doi: 10.1080/0142159X.2020.1856353. Epub 2020 Dec 8.
8
A history of assessment in medical education.医学教育评估史。
Adv Health Sci Educ Theory Pract. 2020 Dec;25(5):1045-1056. doi: 10.1007/s10459-020-10003-0. Epub 2020 Oct 28.
9
Performance assessment: Consensus statement and recommendations from the 2020 Ottawa Conference.表现评估:2020 年渥太华会议的共识声明和建议。
Med Teach. 2021 Jan;43(1):58-67. doi: 10.1080/0142159X.2020.1830052. Epub 2020 Oct 14.
10
Re-conceptualising and accounting for examiner (cut-score) stringency in a 'high frequency, small cohort' performance test.在“高频、小群体”表现测试中重新概念化并解释主考人(cut-score)严格程度。
Adv Health Sci Educ Theory Pract. 2021 May;26(2):369-383. doi: 10.1007/s10459-020-09990-x. Epub 2020 Sep 2.