Carlson Jim, Tomkowiak John, Knott Patrick
Rosalind Franklin University of Medicine and Science, North Chicago, IL 60064, USA.
J Physician Assist Educ. 2010;21(2):7-14. doi: 10.1097/01367895-201021020-00002.
this study explored the reliability of two simple standard-setting methods that are used to set passing standards for a standardized patient (SP) exam in physician assistant (PA) education.
fifty-four second-year PA students participated in a multistation SP-based clinical skills exam. Cut scores were set using the Angoff and Borderline Group methods. A panel of PA faculty set cut scores using the Angoff method. A modified version of the Borderline Group method set cut scores using SP global ratings verified by faculty review. Inter-rater reliability between judges was evaluated using kappa coefficient (k) for the Angoff method and intraclass correlation coefficient (ICC) for the Borderline Group method.
the Borderline Group method set an overall cut score for the exam of 76% (95% CI +/- 5) and the Angoff method set a cut score at 62% (95% CI +/- 9). Both methods demonstrated sufficient inter-rater reliability (k 0.60, ICC > 0.70; both significant at p < 0.05), although one case (preop history and physical) demonstrated poor inter-rater reliability between judges using the Borderline Group method.
the Borderline Group method offered a slightly more reliable cut score when compared to the standard set by the Angoff method, but was more challenging to implement. In addition, one case demonstrated poor inter-rater reliability with the Borderline Group method. Using SPs to complete global borderline ratings offers one solution to make the Borderline Group method more feasibile, but requires a high degree of initial rater calibration and periodic measures of interrater reliability between faculty and SPs.
本研究探讨了两种简单的标准设定方法在医师助理(PA)教育中用于设定标准化病人(SP)考试及格标准的可靠性。
54名PA二年级学生参加了基于多站SP的临床技能考试。使用安格夫法和临界组法设定及格分数。一组PA教员使用安格夫法设定及格分数。临界组法的一个修改版本使用经教员审核验证的SP整体评分来设定及格分数。使用卡帕系数(k)评估安格夫法中评判者间的可靠性,使用组内相关系数(ICC)评估临界组法中评判者间的可靠性。
临界组法为该考试设定的总体及格分数为76%(95%可信区间±5),安格夫法设定的及格分数为62%(95%可信区间±9)。两种方法均显示出足够的评判者间可靠性(k = 0.60,ICC > 0.70;两者在p < 0.05时均具有显著性),尽管有一个案例(术前病史和体格检查)在使用临界组法时评判者间可靠性较差。
与安格夫法设定的标准相比,临界组法提供的及格分数可靠性略高,但实施起来更具挑战性。此外,有一个案例显示临界组法的评判者间可靠性较差。使用SP完成整体临界评分提供了一种使临界组法更可行的解决方案,但需要高度的初始评分者校准以及定期测量教员与SP之间的评判者间可靠性。