Schwiebert P, Davis A
Department of Family Medicine, College of Medicine, University of Oklahoma Health Sciences Center.
Fam Med. 1993 Mar;25(3):182-5.
Although oral examinations are a traditional measure of student performance during clinical clerkships, concerns have been expressed about poor inter-rater reliability on these examinations. This study examines inter-rater agreement in a family medicine clerkship oral examination.
Our study analyzed oral examination scores awarded to a class of 98 junior medical students by three categories of examiner (the clerkship director, full-time faculty, and third-year residents). For each student, the difference among the three examiners' raw scores was compared using a Cronbach statistic. Mean scores for the year awarded by each of the three groups of examiners were compared using a correlation coefficient and paired t tests. For students with high inter-rater disagreement, examiner narrative comments and types of questions on which raters disagreed were analyzed.
Overall inter-rater agreement was high (Cronbach's alpha = .875). Paired t tests were nonsignificant between residents and faculty but were significant between residents and the clerkship director and between faculty and the clerkship director. In the small subset of students with little inter-rater agreement, no clear trends were identified to explain reasons for evaluator disagreement.
High levels of inter-rater agreement on a clerkship oral examination can occur when several measures to increase inter-rater agreement are used.
尽管口试是临床实习期间衡量学生表现的传统方式,但人们对口试评分者间信度较低表示担忧。本研究考察了家庭医学实习口试中评分者间的一致性。
我们的研究分析了三类考官(实习主任、全职教员和三年级住院医师)给98名低年级医学生班级的口试成绩。对于每个学生,使用克朗巴赫统计量比较三位考官原始分数的差异。使用相关系数和配对t检验比较三组考官当年给出的平均分数。对于评分者间分歧较大的学生,分析了考官的叙述性评论以及评分者存在分歧的问题类型。
总体评分者间一致性较高(克朗巴赫α系数=0.875)。住院医师与教员之间的配对t检验无显著性差异,但住院医师与实习主任之间以及教员与实习主任之间的配对t检验有显著性差异。在评分者间一致性较低的一小部分学生中,未发现明确趋势来解释评估者分歧的原因。
当采用多种提高评分者间一致性的措施时,实习口试中评分者间可达成较高水平的一致性。