Brennan Peter A, Croke David T, Reed Malcolm, Smith Lee, Munro Euan, Foulkes John, Arnett Richard
Intercollegiate Committee for Basic Surgical Examinations, The Royal College of Surgeons of England, London, United Kingdom.
Department of Quality Enhancement, The Royal College of Surgeons in Ireland, Dublin, Ireland.
J Surg Educ. 2016 Jul-Aug;73(4):616-23. doi: 10.1016/j.jsurg.2016.01.010. Epub 2016 Feb 26.
Objective structured clinical examinations (OSCE) are widely used for summative assessment in surgery. Despite standardizing these as much as possible, variation, including examiner scoring, can occur which may affect reliability. In study of a high-stakes UK postgraduate surgical OSCE, we investigated whether examiners changing stations once during a long examining day affected marking, reliability, and overall candidates' scores compared with examiners who examined the same scenario all day.
DESIGN, SETTING, AND PARTICIPANTS: An observational study of 18,262 examiner-candidate interactions from the UK Membership of the Royal College of Surgeons examination was carried at 3 Surgical Colleges across the United Kingdom. Scores between examiners were compared using analysis of variance. Examination reliability was assessed with Cronbach's alpha, and the comparative distribution of total candidates' scores for each day was evaluated using t-tests of unit-weighted z scores.
A significant difference was found in absolute scores differences awarded in the morning and afternoon sessions between examiners who changed stations at lunchtime and those who did not (p < 0.001). No significant differences were found for the main effects of either broad content area (p = 0.290) or station content area (p = 0.450). The reliability of each day was not affected by examiner switching (p = 0.280). Overall, no difference was found in z-score distribution of total candidate scores and categories of examiner switching.
This large study has found that although the range of marks awarded varied when examiners change OSCE stations, examination reliability and the likely candidate outcome were not affected. These results may have implications for examination design and examiner experience in surgical OSCEs and beyond.
客观结构化临床考试(OSCE)广泛用于外科手术的总结性评估。尽管尽可能对其进行了标准化,但仍可能出现包括考官评分在内的差异,这可能会影响可靠性。在一项针对英国高风险研究生外科OSCE的研究中,我们调查了在漫长的考试日中中途更换考站的考官与全天都考查相同场景的考官相比,是否会影响评分、可靠性以及考生的总体分数。
设计、地点和参与者:对来自英国皇家外科医学院考试的18262次考官与考生互动进行了一项观察性研究,该研究在英国的3所外科医学院开展。使用方差分析比较考官之间的分数。用克朗巴哈系数评估考试可靠性,并使用单位加权z分数的t检验评估每天考生总分数的比较分布。
在午餐时间更换考站的考官与未更换考站的考官之间,上午和下午场次给出的绝对分数差异存在显著差异(p<0.001)。在宽泛内容领域(p = 0.290)或考站内容领域(p = 0.450)的主要影响方面均未发现显著差异。每天的可靠性不受考官更换的影响(p = 0.280)。总体而言,考生总分数的z分数分布与考官更换类别之间未发现差异。
这项大型研究发现,尽管考官更换OSCE考站时给出的分数范围有所不同,但考试可靠性和可能的考生结果并未受到影响。这些结果可能对外科OSCE及其他领域的考试设计和考官经验具有启示意义。