R. Yudkowsky is professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: https://orcid.org/0000-0002-2145-7582. A. Hyderi is professor, Department of Clinical Science, and founding senior associate dean for medical education, Kaiser Permanente School of Medicine, Pasadena, California; ORCID: https://orcid.org/0000-0002-8521-7510. J. Holden is research assistant, Department of Medical Education, University of Illinois at Chicago College of Medicine, and PharmD candidate, University of Illinois at Chicago College of Pharmacy, Chicago, Illinois. R. Kiser is associate director, Dr. Allan L. and Mary L. Graham Clinical Performance Center of the Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois. R. Stringham is associate professor of clinical medicine, Department of Family Medicine, and assistant dean for curriculum, University of Illinois at Chicago College of Medicine, Chicago, Illinois. A. Gangopadhyaya is assistant professor, Division of General Internal Medicine, Department of Medicine, associate clerkship director, M3 and M4 internal medicine, and associate course director, Doctoring and Clinical Skills, University of Illinois at Chicago College of Medicine, Chicago, Illinois. A. Khan is associate professor of clinical medicine, Division of General Internal Medicine, Department of Medicine, clerkship director, M3 and M4 internal medicine, and course director, Doctoring and Clinical Skills, University of Illinois at Chicago College of Medicine, Chicago, Illinois. Y.S. Park is associate professor, Department of Medical Education, University of Illinois at Chicago College of Medicine, Chicago, Illinois; ORCID: http://orcid.org/0000-0001-8583-4335.
Acad Med. 2019 Nov;94(11S Association of American Medical Colleges Learn Serve Lead: Proceedings of the 58th Annual Research in Medical Education Sessions):S21-S27. doi: 10.1097/ACM.0000000000002904.
Clinical reasoning is often assessed through patient notes (PNs) following standardized patient (SP) encounters. While nonclinicians can score PNs using analytic tools such as checklists, these do not sufficiently encompass the holistic judgments of clinician faculty. To better model faculty judgments, the authors developed checklists with faculty-specified scoring formulas embedded in spreadsheets and studied the resulting interrater reliability (IRR) of nonclinician raters (SPs and medics) and student pass/fail status.
In Study 1, nonclinician and faculty raters rescored PNs of 55 third-year medical students across 5 cases of the 2017 Graduation Competency Examination (GCE) to determine IRR. In Study 2, nonclinician raters scored all notes of the 5-case 2018 GCE (178 students). Faculty rescored all notes of failing students and could modify formula-derived scores if faculty felt appropriate. Faculty also rescored and corrected scores of additional notes for a total of 90 notes (3 cases, including failing notes).
Mean overall percent exact agreement between nonclinician and faculty ratings was 87% (weighted kappa, 0.86) and 83% (weighted kappa, 0.88) for Study 1 and Study 2, respectively. SP and medic IRRs did not differ significantly. Four students failed the note section in 2018; 3 passed after faculty corrections. Few corrections were made to nonfailing students' notes.
Nonclinician PN raters using checklists and scoring rules may provide a feasible alternative to faculty raters for low-stakes assessments and for the bulk of well-performing students. Faculty effort can be targeted strategically at rescoring notes of low-performing students and providing more detailed feedback.
临床推理通常通过标准化患者 (SP) 就诊后的患者记录 (PN) 进行评估。虽然非临床医生可以使用分析工具(如检查表)对 PN 进行评分,但这些工具并不能充分涵盖临床教师的整体判断。为了更好地模拟教师的判断,作者开发了带有教师指定评分公式的检查表,并研究了非临床评分者(SP 和医务人员)和学生及格/不及格状态的评分者间信度 (IRR)。
在研究 1 中,非临床医生和教师评分者对 55 名三年级医学生的 5 个 2017 年毕业能力考试 (GCE) 病例的 PN 进行重新评分,以确定 IRR。在研究 2 中,非临床评分者对 5 个 2018 年 GCE 的所有记录进行评分(178 名学生)。教师对所有不及格学生的记录进行重新评分,如果教师认为合适,可以修改公式评分。教师还对另外 90 份记录(包括不及格记录)进行了评分和纠正,共 3 个病例。
非临床医生和教师评分者之间的平均总体百分比完全一致,分别为 87%(加权 κ,0.86)和 83%(加权 κ,0.88),研究 1 和研究 2 分别为 87%(加权 κ,0.86)和 83%(加权 κ,0.88)。SP 和医务人员的 IRR 没有显著差异。2018 年有 4 名学生在记录部分不及格;3 名学生在教师纠正后通过。对表现良好的学生的记录很少进行更正。
使用检查表和评分规则的非临床 PN 评分者可为低风险评估和表现良好的大部分学生提供一种可行的替代教师评分者的方法。可以有策略地针对表现不佳学生的记录进行教师评分,并提供更详细的反馈。