Centre for Health Sciences Education, Faculty of Medicine, University of Oslo, Oslo, Norway.
Centre for Educational Measurement (CEMO), Faculty of Educational Sciences, University of Oslo, Oslo, Norway.
Adv Health Sci Educ Theory Pract. 2024 Nov;29(5):1749-1767. doi: 10.1007/s10459-024-10328-0. Epub 2024 Apr 23.
Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that inconsistencies in ratings depend on the level of performance a to-be-evaluated candidate shows. This possibility has not been addressed deliberately and with appropriate statistical methods. By adopting the theoretical lens of ecological rationality, we evaluate if easily implementable strategies can enhance decision making in real-world assessment contexts.
We address two objectives. First, we investigate the dependence of rater-consistency on performance levels. We recorded videos of mock-exams and had examiners (N=10) evaluate four students' performances and compare inconsistencies in performance ratings between examiner-pairs using a bootstrapping procedure. Our second objective is to provide an approach that aids decision making by implementing simple heuristics.
We found that discrepancies were largely a function of the level of performance the candidates showed. Lower performances were rated more inconsistently than excellent performances. Furthermore, our analyses indicated that the use of simple heuristics might improve decisions in examiner pairs.
Inconsistencies in performance judgments continue to be a matter of concern, and we provide empirical evidence for them to be related to candidate performance. We discuss implications for research and the advantages of adopting the perspective of ecological rationality. We point to directions both for further research and for development of assessment practices.
各个领域的研究表明,专家判断可能高度不一致。然而,在许多情况下,专家判断是不可或缺的。在医学教育中,专家通常作为基于评分者的评估中的考官。在这里,考官之间的意见分歧可能会产生深远的影响。文献表明,评分的不一致性取决于待评估候选人的表现水平。这种可能性尚未被有意地、用适当的统计方法来解决。通过采用生态理性的理论视角,我们评估了在现实评估情境中,是否可以采用易于实施的策略来增强决策。
我们旨在实现两个目标。首先,我们调查了评分者一致性对表现水平的依赖性。我们记录了模拟考试的视频,并让考官(N=10)评估四名学生的表现,并使用自举程序比较考官对表现评分的不一致性。我们的第二个目标是提供一种方法,通过实施简单的启发式方法来辅助决策。
我们发现,差异主要是候选人表现水平的函数。表现较差的被评为更不一致,而表现出色的则不然。此外,我们的分析表明,使用简单的启发式方法可能会改善考官对决策的影响。
绩效判断的不一致性仍然是一个令人关注的问题,我们提供了实证证据表明它们与候选人的表现有关。我们讨论了对研究和采用生态理性视角的意义。我们指出了进一步研究和评估实践发展的方向。