Primary Care Clinical Unit, Faculty of Medicine, The University of Queensland, Brisbane, Queensland, Australia.
Royal Brisbane & Women's Hospitals, Level 8, Health Sciences Building, Herston, QLD, Australia.
BMC Med Educ. 2017 Jun 6;17(1):101. doi: 10.1186/s12909-017-0929-9.
Robust and defensible clinical assessments attempt to minimise differences in student grades which are due to differences in examiner severity (stringency and leniency). Unfortunately there is little evidence to date that examiner training and feedback interventions are effective; "physician raters" have indeed been deemed "impervious to feedback". Our aim was to investigate the effectiveness of a general practitioner examiner feedback intervention, and explore examiner attitudes to this.
Sixteen examiners were provided with a written summary of all examiner ratings in medical student clinical case examinations over the preceding 18 months, enabling them to identify their own rating data and compare it with other examiners. Examiner ratings and examiner severity self-estimates were analysed pre and post intervention, using non-parametric bootstrapping, multivariable linear regression, intra-class correlation and Spearman's correlation analyses. Examiners completed a survey exploring their perceptions of the usefulness and acceptability of the intervention, including what (if anything) examiners planned to do differently as a result of the feedback.
Examiner severity self-estimates were relatively poorly correlated with measured severity on the two clinical case examination types pre-intervention (0.29 and 0.67) and were less accurate post-intervention. No significant effect of the intervention was identified, when differences in case difficulty were controlled for, although there were fewer outlier examiners post-intervention. Drift in examiner severity over time prior to the intervention was observed. Participants rated the intervention as interesting and useful, and survey comments indicated that fairness, reassurance, and understanding examiner colleagues are important to examiners.
Despite our participants being receptive to our feedback and wanting to be "on the same page", we did not demonstrate effective use of the feedback to change their rating behaviours. Calibration of severity appears to be difficult for examiners, and further research into better ways of providing more effective feedback is indicated.
稳健且合理的临床评估旨在将学生成绩的差异最小化,这些差异归因于考试者的严格程度(严格和宽松)差异。不幸的是,迄今为止,几乎没有证据表明考试者培训和反馈干预措施是有效的;“医生评分者”确实被认为“对反馈无动于衷”。我们的目的是调查普通科医生考试者反馈干预措施的有效性,并探讨考试者对此的态度。
为 16 名考试者提供了过去 18 个月中所有考试者对医学生临床病例考试评分的书面摘要,使他们能够识别自己的评分数据,并将其与其他考试者进行比较。在干预前后,使用非参数引导、多变量线性回归、组内相关和斯皮尔曼相关分析,分析考试者评分和考试者严重程度自我估计。考试者完成了一项调查,探讨他们对干预措施的有用性和可接受性的看法,包括(如果有的话)考试者计划因反馈而有所不同。
在干预前,考试者严重程度自我估计与两种临床病例考试类型的测量严重程度相关性较差(0.29 和 0.67),且干预后准确性降低。在控制病例难度差异的情况下,干预没有产生显著效果,尽管干预后考试者的异常值较少。在干预前,考试者的严重程度随时间推移而发生漂移。参与者认为干预措施有趣且有用,调查评论表明公平、放心和了解考试者同事对考试者很重要。
尽管我们的参与者对我们的反馈持开放态度并希望“保持一致”,但我们没有证明有效地利用反馈来改变他们的评分行为。考试者的严重程度似乎难以校准,需要进一步研究如何提供更有效的反馈。