Park Yoon Soo, Hyderi Abbas, Heine Nancy, May Win, Nevins Andrew, Lee Ming, Bordage Georges, Yudkowsky Rachel
Y.S. Park is associate professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: http://orcid.org/0000-0001-8583-4335. A. Hyderi is associate dean for curriculum and associate professor, Department of Family Medicine, University of Illinois at Chicago College of Medicine, Chicago, Illinois. N. Heine is assistant professor, Department of Medical Education and Department of Medicine, and director, Clinical Skills Education Center, Loma Linda University School of Medicine, Loma Linda, California; ORCID: http://orcid.org/0000-0001-6812-9079. W. May is professor, Department of Medical Education, and director, Clinical Skills Education and Evaluation Center, Keck School of Medicine of the University of Southern California, Los Angeles, California. A. Nevins is clinical associate professor, Department of Medicine, Stanford University School of Medicine, Palo Alto, California. M. Lee is professor of medical education, University of California, Los Angeles David Geffen School of Medicine, Los Angeles, California. G. Bordage is professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois. R. Yudkowsky is director, Graham Clinical Performance Center, and professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: http://orcid.org/0000-0002-2145-7582.
Acad Med. 2017 Nov;92(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 56th Annual Research in Medical Education Sessions):S12-S20. doi: 10.1097/ACM.0000000000001918.
To examine validity evidence of local graduation competency examination scores from seven medical schools using shared cases and to provide rater training protocols and guidelines for scoring patient notes (PNs).
Between May and August 2016, clinical cases were developed, shared, and administered across seven medical schools (990 students participated). Raters were calibrated using training protocols, and guidelines were developed collaboratively across sites to standardize scoring. Data included scores from standardized patient encounters for history taking, physical examination, and PNs. Descriptive statistics were used to examine scores from the different assessment components. Generalizability studies (G-studies) using variance components were conducted to estimate reliability for composite scores.
Validity evidence was collected for response process (rater perception), internal structure (variance components, reliability), relations to other variables (interassessment correlations), and consequences (composite score). Student performance varied by case and task. In the PNs, justification of differential diagnosis was the most discriminating task. G-studies showed that schools accounted for less than 1% of total variance; however, for the PNs, there were differences in scores for varying cases and tasks across schools, indicating a school effect. Composite score reliability was maximized when the PN was weighted between 30% and 40%. Raters preferred using case-specific scoring guidelines with clear point-scoring systems.
This multisite study presents validity evidence for PN scores based on scoring rubric and case-specific scoring guidelines that offer rigor and feedback for learners. Variability in PN scores across participating sites may signal different approaches to teaching clinical reasoning among medical schools.
使用共享病例检验七所医学院校的局部毕业能力考试成绩的效度证据,并提供评分者培训方案以及患者记录(PN)评分指南。
2016年5月至8月期间,开发、共享并在七所医学院校实施了临床病例(990名学生参与)。使用培训方案对评分者进行校准,并跨站点协作制定指南以规范评分。数据包括标准化患者问诊中病史采集、体格检查和PN的分数。使用描述性统计来检验不同评估组件的分数。进行了使用方差分量的概化研究(G研究)以估计综合分数的可靠性。
收集了关于反应过程(评分者感知)、内部结构(方差分量、可靠性)、与其他变量的关系(评估间相关性)和后果(综合分数)的效度证据。学生表现因病例和任务而异。在PN中,鉴别诊断的理由是最具区分性的任务。G研究表明,学校占总方差的比例不到1%;然而,对于PN,不同学校的不同病例和任务的分数存在差异,表明存在学校效应。当PN权重在30%至40%之间时,综合分数的可靠性最大化。评分者更喜欢使用具有明确分数系统的特定病例评分指南。
这项多站点研究基于评分标准和特定病例评分指南提供了PN分数的效度证据,这些指南为学习者提供了严谨性和反馈。参与站点间PN分数的差异可能表明医学院校在临床推理教学方面采用了不同方法。