Dr. Ginsburg is professor, Department of Medicine, and scientist, Wilson Centre for Research in Education, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada. Dr. Eva is professor, Department of Medicine, and senior scientist, Centre for Health Education Scholarship, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada. Dr. Regehr is professor, Department of Surgery, and associate director, Centre for Health Education Scholarship, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada.
Acad Med. 2013 Oct;88(10):1539-44. doi: 10.1097/ACM.0b013e3182a36c3d.
Although scores on in-training evaluation reports (ITERs) are often criticized for poor reliability and validity, ITER comments may yield valuable information. The authors assessed across-rotation reliability of ITER scores in one internal medicine program, ability of ITER scores and comments to predict postgraduate year three (PGY3) performance, and reliability and incremental predictive validity of attendings' analysis of written comments.
Numeric and narrative data from the first two years of ITERs for one cohort of residents at the University of Toronto Faculty of Medicine (2009-2011) were assessed for reliability and predictive validity of third-year performance. Twenty-four faculty attendings rank-ordered comments (without scores) such that each resident was ranked by three faculty. Mean ITER scores and comment rankings were submitted to regression analyses; dependent variables were PGY3 ITER scores and program directors' rankings.
Reliabilities of ITER scores across nine rotations for 63 residents were 0.53 for both postgraduate year one (PGY1) and postgraduate year two (PGY2). Interrater reliabilities across three attendings' rankings were 0.83 for PGY1 and 0.79 for PGY2. There were strong correlations between ITER scores and comments within each year (0.72 and 0.70). Regressions revealed that PGY1 and PGY2 ITER scores collectively explained 25% of variance in PGY3 scores and 46% of variance in PGY3 rankings. Comment rankings did not improve predictions.
ITER scores across multiple rotations showed decent reliability and predictive validity. Comment ranks did not add to the predictive ability, but correlation analyses suggest that trainee performance can be measured through these comments.
尽管培训评估报告(ITER)的分数常因可靠性和有效性差而受到批评,但 ITER 评论可能提供有价值的信息。作者评估了一个内科住院医师培训项目中 ITER 分数的跨轮次可靠性、ITER 分数和评论预测第 3 年住院医师表现的能力,以及主治医生对书面评论的分析的可靠性和增量预测效度。
评估多伦多大学医学院(2009-2011 年)一个住院医师队列的前两年 ITER 的数字和叙述数据,以评估第 3 年表现的预测效度。24 名教员对评论(无分数)进行排序,使每位住院医师由 3 名教员进行排序。平均 ITER 分数和评论排名被提交给回归分析;因变量为第 3 年 ITER 分数和项目主管的排名。
63 名住院医师的 9 个轮次的 ITER 分数的可靠性在第 1 年(PGY1)和第 2 年(PGY2)时均为 0.53。3 名教员的排名的组内可靠性在 PGY1 时为 0.83,在 PGY2 时为 0.79。每年 ITER 分数和评论之间都存在较强的相关性(0.72 和 0.70)。回归显示,PGY1 和 PGY2 ITER 分数共同解释了第 3 年成绩的 25%的方差和第 3 年排名的 46%的方差。评论排名并没有提高预测能力。
多个轮次的 ITER 分数表现出良好的可靠性和预测效度。评论排名并没有增加预测能力,但相关分析表明,通过这些评论可以衡量学员的表现。