Kalet A, Earp J A, Kowlowitz V
Robert Wood Johnson Clinical Scholars Program, Department of Medicine, University of North Carolina, Chapel Hill.
J Gen Intern Med. 1992 Sep-Oct;7(5):499-505. doi: 10.1007/BF02599452.
To study the reliability and validity of using medical school faculty in the evaluation of the interviewing skills of medical students.
All second-year University of North Carolina medical students (n = 159) were observed interviewing standardized patients for 5 minutes by one of eight experienced clinical faculty. Interview quality was assessed by a faculty checklist covering questioning style, facilitative behaviors, and specific content. Twenty-one randomly chosen students were videotaped and rated: by the original rater as well as four other raters; by two nationally recognized experts; and according to Roter's coding dimensions, which have been found to correlate strongly with patient compliance and satisfaction.
Medical school at a state university in the southeastern United States.
Faculty members who volunteered to evaluate second-year medical students during an annual Objective Structured Clinical Exam.
Interrater reliability and intrarater reliability were tested using videotapes of medical students interviewing a standardized patient. Validity was tested by comparing the faculty judgment with both an analysis using the Roter Interactional Analysis System and an assessment made by expert interviewers.
Faculty mean checklist score was 80% (range 41-100%). Intrarater reliability was poor for assessment of skills and behaviors as compared with that for content obtained. Interrater reliability was also poor as measured by intraclass correlation coefficients ranging from 0.11 to 0.37. When compared with the experts, faculty raters had a sensitivity of 80% but a specificity of 45% in identifying students with adequate skills. The predictive value of faculty assessment was 12%. Analysis using Roter's coding scheme suggests that faculty scored students on the basis of likability rather than specific behavioral skills, limiting their ability to provide behaviorally specific feedback.
To accurately evaluate clinical interviewing skills we must enhance rater consistency, particularly in assessing those skills that both satisfy patients and yield crucial data.
研究医学院教员在评估医学生面试技巧方面的可靠性和有效性。
北卡罗来纳大学所有二年级医学生(n = 159)由八位经验丰富的临床教员之一观察其与标准化病人进行5分钟的面试。面试质量通过教员清单进行评估,该清单涵盖提问风格、辅助行为和具体内容。随机挑选21名学生进行录像并评分:由最初的评分者以及其他四名评分者评分;由两名全国知名专家评分;并根据罗特编码维度评分,该维度已被发现与患者依从性和满意度密切相关。
美国东南部一所州立大学的医学院。
在年度客观结构化临床考试期间自愿评估二年级医学生的教员。
使用医学生与标准化病人面试的录像带测试评分者间信度和评分者内信度。通过将教员的判断与使用罗特互动分析系统的分析以及专家面试官的评估进行比较来测试效度。
教员清单平均得分80%(范围41 - 100%)。与所获内容的评估相比,技能和行为评估的评分者内信度较差。组内相关系数在0.11至0.37之间,评分者间信度也较差。与专家相比,教员评分者在识别具备足够技能的学生时灵敏度为80%,但特异度为45%。教员评估的预测价值为12%。使用罗特编码方案的分析表明,教员根据学生的亲和力而非具体行为技能对学生进行评分,这限制了他们提供行为特异性反馈的能力。
为准确评估临床面试技巧,我们必须提高评分者的一致性,特别是在评估那些既能让患者满意又能产生关键数据的技能方面。