School of Translational Medicine, Academy at UHSM, [corrected] University of Manchester, Manchester, United Kingdom.
JAMA. 2012 Dec 5;308(21):2226-32. doi: 10.1001/jama.2012.36515.
Competency-based models of education require assessments to be based on individuals' capacity to perform, yet the nature of human judgment may fundamentally limit the extent to which such assessment is accurately possible.
To determine whether recent observations of the Mini Clinical Evaluation Exercise (Mini-CEX) performance of postgraduate year 1 physicians influence raters' scores of subsequent performances, consistent with either anchoring bias (scores biased similar to previous experience) or contrast bias (scores biased away from previous experience).
DESIGN, SETTING, AND PARTICIPANTS: Internet-based randomized, blinded experiment using videos of Mini-CEX assessments of postgraduate year 1 trainees interviewing new internal medicine patients. Participants were 41 attending physicians from England and Wales experienced with the Mini-CEX, with 20 watching and scoring 3 good trainee performances and 21 watching and scoring 3 poor performances. All then watched and scored the same 3 borderline video performances. The study was completed between July and November 2011.
The primary outcome was scores assigned to the borderline videos, using a 6-point Likert scale (anchors included: 1, well below expectations; 3, borderline; 6, well above expectations). Associations were tested in a multivariable analysis that included participants' sex, years of practice, and the stringency index (within-group z score of initial 3 ratings).
The mean rating scores assigned by physicians who viewed borderline video performances following exposure to good performances was 2.7 (95% CI, 2.4-3.0) vs 3.4 (95% CI, 3.1-3.7) following exposure to poor performances (difference of 0.67 [95% CI, 0.28-1.07]; P = .001). Borderline videos were categorized as consistent with failing scores in 33 of 60 assessments (55%) in those exposed to good performances and in 15 of 63 assessments (24%) in those exposed to poor performances (P < .001). They were categorized as consistent with passing scores in 5 of 60 assessments (8.3%) in those exposed to good performances compared with 25 of 63 assessments (39.5%) in those exposed to poor performances (P < .001). Sex and years of attending practice were not associated with scores. The priming condition (good vs poor performances) and the stringency index jointly accounted for 45% of the observed variation in raters' scores for the borderline videos (P < .001).
In an experimental setting, attending physicians exposed to videos of good medical trainee performances rated subsequent borderline performances lower than those who had been exposed to poor performances, consistent with a contrast bias.
基于能力的教育模式要求评估基于个人的执行能力,然而人类判断的本质可能从根本上限制了这种评估的准确性。
确定最近观察到的住院医师第 1 年医学生的迷你临床评估练习(Mini-CEX)表现是否会影响评分者对随后表现的评分,这与锚定偏差(评分偏向于先前的经验)或对比偏差(评分偏离先前的经验)一致。
设计、地点和参与者:使用住院医师第 1 年受训者对新内科患者进行访谈的 Mini-CEX 评估视频,进行基于互联网的随机、盲法实验。参与者为来自英格兰和威尔士的 41 名有经验的主治医生,其中 20 人观看并评分 3 次表现良好的学员,21 人观看并评分 3 次表现较差的学员。然后所有人都观看并对相同的 3 个边缘视频表现进行评分。该研究于 2011 年 7 月至 11 月间完成。
主要结局指标为使用 6 分李克特量表(锚定物包括:1,远低于预期;3,边缘;6,远高于预期)对边缘视频进行评分。在多变量分析中测试了关联,其中包括参与者的性别、实践年限和严格度指数(初始 3 次评分的组内 z 分数)。
在暴露于表现良好的视频后,对边缘视频表现评分的医生平均评分为 2.7(95%置信区间,2.4-3.0),而在暴露于表现较差的视频后评分为 3.4(95%置信区间,3.1-3.7)(差异为 0.67[95%置信区间,0.28-1.07];P=0.001)。在暴露于表现良好的视频的 60 次评估中,有 33 次(55%)被归类为不及格,而在暴露于表现较差的视频的 63 次评估中,有 15 次(24%)被归类为不及格(P<0.001)。在暴露于表现良好的视频的 60 次评估中,有 5 次(8.3%)被归类为及格,而在暴露于表现较差的视频的 63 次评估中,有 25 次(39.5%)被归类为及格(P<0.001)。性别和行医年限与评分无关。启动条件(表现良好与表现较差的视频)和严格度指数共同解释了评分者对边缘视频评分的 45%的观察变异(P<0.001)。
在实验环境中,接触表现良好的医学生视频的主治医生对随后的边缘表现评分低于接触表现较差的医生,这与对比偏差一致。