K.E. Hauer is associate dean for assessment and professor, Department of Medicine, University of California at San Francisco, San Francisco, California. J. Vandergrift is a health services researcher, American Board of Internal Medicine (ABIM), Philadelphia, Pennsylvania. R.S. Lipner is senior vice president of assessment and research, ABIM, Philadelphia, Pennsylvania. E.S. Holmboe is senior vice president of milestones development and evaluation, Accreditation Council for Graduate Medical Education, Chicago, Illinois. S. Hood is director of initial certification, ABIM, Philadelphia, Pennsylvania. F.S. McDonald is senior vice president of academic and medical affairs, ABIM, Philadelphia, Pennsylvania.
Acad Med. 2018 Aug;93(8):1189-1204. doi: 10.1097/ACM.0000000000002234.
To evaluate validity evidence for internal medicine milestone ratings across programs for three resident cohorts by quantifying "not assessable" ratings; reporting mean longitudinal milestone ratings for individual residents; and correlating medical knowledge ratings across training years with certification examination scores to determine predictive validity of milestone ratings for certification outcomes.
This retrospective study examined milestone ratings for postgraduate year (PGY) 1-3 residents in U.S. internal medicine residency programs. Data sources included milestone ratings, program characteristics, and certification examination scores.
Among 35,217 participants, there was a decreased percentage with "not assessable" ratings across years: 1,566 (22.5%) PGY1s in 2013-2014 versus 1,219 (16.6%) in 2015-2016 (P = .01), and 342 (5.1%) PGY3s in 2013-2014 versus 177 (2.6%) in 2015-2016 (P = .04). For individual residents with three years of ratings, mean milestone ratings increased from around 3 (behaviors of an early learner or advancing resident) in PGY1 (ranging from a mean of 2.73 to 3.19 across subcompetencies) to around 4 (ready for unsupervised practice) in PGY3 (mean of 4.00 to 4.22 across subcompetencies, P < .001 for all subcompetencies). For each increase of 0.5 units in two medical knowledge (MK1, MK2) subcompetency ratings, the difference in examination scores for PGY3s was 19.5 points for MK1 (P < .001) and 19.0 for MK2 (P < .001).
These findings provide evidence of validity of the milestones by showing how training programs have applied them over time and how milestones predict other training outcomes.
通过量化“不可评估”评分,报告个体住院医师的纵向里程碑平均评分,并将培训年限的医学知识评分与认证考试成绩相关联,以确定里程碑评分对认证结果的预测效度,从而评估三个住院医师群体的内科里程碑评分在各项目间的有效性证据。
本回顾性研究调查了美国内科住院医师培训项目中 PGY1-3 住院医师的里程碑评分。数据来源包括里程碑评分、项目特征和认证考试成绩。
在 35217 名参与者中,各年度“不可评估”评分的比例逐渐降低:2013-2014 年的 PGY1 有 1566 人(22.5%),而 2015-2016 年的有 1219 人(16.6%)(P=0.01);2013-2014 年的 PGY3 有 342 人(5.1%),而 2015-2016 年的有 177 人(2.6%)(P=0.04)。对于有三年评分的个体住院医师,里程碑评分从 PGY1 时的 3 分左右(早期学习者或进步住院医师的行为)(各亚能力的平均评分为 2.73-3.19)增加到 PGY3 时的 4 分左右(准备进行无监督实践)(各亚能力的平均评分为 4.00-4.22,所有亚能力均 P<0.001)。在每个医学知识(MK1、MK2)两个亚能力评分增加 0.5 分的情况下,PGY3 考试成绩的差异为 MK1 为 19.5 分(P<0.001),MK2 为 19.0 分(P<0.001)。
这些发现通过展示培训项目如何随时间应用里程碑以及里程碑如何预测其他培训结果,为里程碑的有效性提供了证据。