Both authors are with University of Central Florida College of Medicine. is Professor of Medical Education; and.
is Associate Professor of Medical Education.
J Grad Med Educ. 2023 Aug;15(4):488-493. doi: 10.4300/JGME-D-22-00678.1.
The Medical Student Performance Evaluation (MSPE), a narrative summary of each student's academic and professional performance in US medical school is long, making it challenging for residency programs evaluating large numbers of applicants.
To create a rubric to assess MSPE narratives and to compare the ability of 3 commercially available machine learning models (MLMs) to rank MSPEs in order of positivity.
Thirty out of a possible 120 MSPEs from the University of Central Florida class of 2020 were de-identified and subjected to manual scoring and ranking by a pair of faculty members using a new rubric based on the Accreditation Council for Graduate Medical Education competencies, and to global sentiment analysis by the MLMs. Correlation analysis was used to assess reliability and agreement between student rank orders produced by faculty and MLMs.
The intraclass correlation coefficient used to assess faculty interrater reliability was 0.864 (<.001; 95% CI 0.715-0.935) for total rubric scores and ranged from 0.402 to 0.768 for isolated subscales; faculty rank orders were also highly correlated (r=0.758; <.001; 95% CI 0.539-0.881). The authors report good feasibility as the rubric was easy to use and added minimal time to reading MSPEs. The MLMs correctly reported a positive sentiment for all 30 MSPE narratives, but their rank orders produced no significant correlations between different MLMs, or when compared with faculty rankings.
The rubric for manual grading provided reliable overall scoring and ranking of MSPEs. The MLMs accurately detected positive sentiment in the MSPEs but were unable to provide reliable rank ordering.
美国医学院学生表现评估(MSPE)是对每位学生在医学院的学术和专业表现的叙述性总结,篇幅较长,这使得评估大量申请人的住院医师项目面临挑战。
创建一个评估 MSPE 叙述的评分标准,并比较 3 种商业上可用的机器学习模型(MLM)对 MSPE 进行排序的能力,以确定其积极程度。
从 2020 年中佛罗里达大学的 120 名 MSPE 中选出 30 名,对其进行去识别,并由两名教员使用新的评分标准(基于研究生医学教育认证委员会的能力)进行手动评分和排序,同时使用 MLM 进行全球情感分析。使用相关分析评估教员和 MLM 生成的学生排名之间的可靠性和一致性。
用于评估教员间评分者可靠性的组内相关系数为 0.864(<.001;95%CI 0.715-0.935),总评分标准的范围为 0.402-0.768;教员排名也高度相关(r=0.758;<.001;95%CI 0.539-0.881)。报告称该评分标准具有良好的可行性,因为其易于使用且在阅读 MSPE 时仅增加了少量时间。MLM 正确报告了所有 30 个 MSPE 叙述的积极情绪,但它们的排名与不同的 MLM 之间,或者与教员排名之间没有显著相关性。
手动评分的评分标准提供了可靠的 MSPE 整体评分和排序。MLM 准确地检测到了 MSPE 中的积极情绪,但无法提供可靠的排序。