Feigerlova Eva
Centre Universitaire d'Enseignement par Simulation - CUESim, Faculté de Médecine, Maïeutique et Métiers de la Santé, Vandoeuvre-lès-Nancy, 54505, France.
Université de Lorraine, Inserm, DCAC, Vandoeuvre-lès-Nancy, 54505, France.
BMC Med Educ. 2025 May 2;25(1):643. doi: 10.1186/s12909-025-07255-y.
The national OSCE examination has recently been adopted in France as a prerequisite for medical students to enter accredited graduate education programs. However, the reliability and generalizability of OSCE scores are not well explored taking into account the national examination blueprint.
To obtain complementary information for monitoring and improving the quality of the OSCE we performed a pilot study applying generalizability (G-)theory on a sample of 6th-year undergraduate medical students (n = 73) who were assessed by 24 examiner pairs at three stations. Based on the national blueprint, three different scoring subunits (a dichotomous task-specific checklist evaluating clinical skills and behaviorally anchored scales evaluating generic skills and a global performance scale) were used to evaluate students and combined into a station score. A variance component analysis was performed using mixed modelling to identify the impact of different facets (station, student and student x station interactions) on the scoring subunits. The generalizability and dependability statistics were calculated.
There was no significant difference between mean scores attributable to different examiner pairs across the data. The examiner variance component was greater for the clinical skills score (14.4%) than for the generic skills (5.6%) and global performance scores (5.1%). The station variance component was largest for the clinical skills score, accounting for 22.9% of the total score variance, compared to 3% for the generic skills and 13.9% for global performance scores. The variance component related to student represented 12% of the total variance for clinicals skills, 17.4% for generic skills and 14.3% for global performance ratings. The combined generalizability coefficients across all the data were 0.59 for the clinical skills score, 0.93 for the generic skills score and 0.75 for global performance.
The combined estimates of relative reliability across all data are greater for generic skills scores and global performance ratings than for clinical skills scores. This is likely explained by the fact that content-specific tasks evaluated using checklists produce greater variability in scores than scales evaluating broader competencies. This work can be valuable to other teaching institutions, as monitoring the sources of errors is a principal quality control strategy to ensure valid interpretations of the students' scores.
法国最近采用国家客观结构化临床考试(OSCE)作为医学生进入认可的研究生教育项目的前提条件。然而,考虑到国家考试蓝图,OSCE分数的可靠性和普遍性尚未得到充分探讨。
为了获取用于监测和提高OSCE质量的补充信息,我们进行了一项试点研究,对73名本科六年级医学生样本应用概化(G-)理论,这些学生在三个站点由24对考官进行评估。根据国家蓝图,使用三个不同的评分子单元(一个评估临床技能的二分任务特定检查表、评估通用技能的行为锚定量表和一个整体表现量表)对学生进行评估,并合并为站点分数。使用混合模型进行方差成分分析,以确定不同方面(站点、学生和学生×站点交互)对评分子单元的影响。计算概化性和可靠性统计量。
不同考官对的数据的平均分数之间没有显著差异。临床技能分数的考官方差成分(14.4%)大于通用技能(5.6%)和整体表现分数(5.1%)。临床技能分数的站点方差成分最大占总分方差的22.9%,相比之下通用技能为3%,整体表现分数为13.9%。与学生相关的方差成分在临床技能总分方差中占12%,通用技能中占17.4%,整体表现评分中占14.3%。所有数据中临床技能分数的综合概化系数为0.59,通用技能分数为0.93,整体表现为0.75。
所有数据中通用技能分数和整体表现评分的相对可靠性综合估计值高于临床技能分数。这可能是因为使用检查表评估的特定内容任务比分评估更广泛能力的量表产生的分数变异性更大。这项工作对其他教学机构可能有价值,因为监测误差来源是确保对学生分数进行有效解释的主要质量控制策略。