Intensive Care Unit, The Alfred Hospital, Melbourne, Australia.
Department of Epidemiology and Preventative Medicine, School of Public Health, Monash University, Melbourne, Australia.
BMC Med Educ. 2024 May 11;24(1):527. doi: 10.1186/s12909-024-05516-w.
High stakes examinations used to credential trainees for independent specialist practice should be evaluated periodically to ensure defensible decisions are made. This study aims to quantify the College of Intensive Care Medicine of Australia and New Zealand (CICM) Hot Case reliability coefficient and evaluate contributions to variance from candidates, cases and examiners.
This retrospective, de-identified analysis of CICM examination data used descriptive statistics and generalisability theory to evaluate the reliability of the Hot Case examination component. Decision studies were used to project generalisability coefficients for alternate examination designs.
Examination results from 2019 to 2022 included 592 Hot Cases, totalling 1184 individual examiner scores. The mean examiner Hot Case score was 5.17 (standard deviation 1.65). The correlation between candidates' two Hot Case scores was low (0.30). The overall reliability coefficient for the Hot Case component consisting of two cases observed by two separate pairs of examiners was 0.42. Sources of variance included candidate proficiency (25%), case difficulty and case specificity (63.4%), examiner stringency (3.5%) and other error (8.2%). To achieve a reliability coefficient of > 0.8 a candidate would need to perform 11 Hot Cases observed by two examiners.
The reliability coefficient for the Hot Case component of the CICM second part examination is below the generally accepted value for a high stakes examination. Modifications to case selection and introduction of a clear scoring rubric to mitigate the effects of variation in case difficulty may be helpful. Increasing the number of cases and overall assessment time appears to be the best way to increase the overall reliability. Further research is required to assess the combined reliability of the Hot Case and viva components.
用于为独立专家实践认证学员的高风险考试应定期进行评估,以确保做出合理的决策。本研究旨在量化澳大利亚和新西兰重症监护医学学院(CICM)热点案例的可靠性系数,并评估考生、案例和考官对变异性的贡献。
本研究使用描述性统计和可推广性理论对 CICM 考试数据进行回顾性、去识别分析,以评估热点案例考试部分的可靠性。决策研究用于预测替代考试设计的可推广性系数。
纳入了 2019 年至 2022 年的考试数据,共包括 592 个热点案例,总计 1184 个单独考官的分数。考官的平均热点案例分数为 5.17(标准差 1.65)。考生两次热点案例分数之间的相关性较低(0.30)。由两个独立的考官对两个案例进行观察的热点案例部分的总体可靠性系数为 0.42。变因包括考生熟练程度(25%)、案例难度和案例特异性(63.4%)、考官严格程度(3.5%)和其他误差(8.2%)。要达到>0.8 的可靠性系数,考生需要进行 11 次由两个考官观察的热点案例。
CICM 第二部分考试热点案例部分的可靠性系数低于高风险考试的一般接受值。可能需要修改案例选择并引入明确的评分规则,以减轻案例难度变化的影响。增加案例数量和整体评估时间似乎是提高整体可靠性的最佳方法。需要进一步研究来评估热点案例和口试部分的综合可靠性。