Croen L G, Moroff S V
Albert Einstein College of Medicine, Office of Educational Research and Evaluation, Bronx, NY 10461.
Acad Med. 1994 Apr;69(4):310-2. doi: 10.1097/00001888-199404000-00017.
Scoring protocols for most standardized-patient (SP) examinations have not received extensive scrutiny and their validity has not been well established.
A holistic method (i.e., one based on raters' overall impressions) of scoring performance on an SP examination was pilot-tested in the spring of 1992 by administering an examination to two cohorts of fourth-year students at the Albert Einstein College of Medicine at Yeshiva University. The examination consisted of eight SP stations, representing a range of medical problems. Two to three experienced clinical teachers independently reviewed all the written material for each encounter. In Phase I of the study, holistic ratings of outstanding, competent, marginal, or inadequate were given for overall clinical competence for a cohort of 16 students; in Phase II, holistic ratings were given separately for data-gathering and communication skills for a cohort of 26 students. Intercase and interrater reliability analyses were performed.
Adequate reliability coefficients were obtained on a two-hour test; total scores (i.e., students' scores across all eight cases) discriminated between groups of examinees; and, on average, less than two minutes were required to score an encounter.
Although based on a small sample, the study's results suggest that this holistic method of scoring performance may be useful in some situations. Since experienced clinical teachers know and agree about clinical competence when they see it, developers of scoring protocols for SP examinations need to establish that the results obtained are congruent with the judgments of expert teachers.
大多数标准化病人(SP)考试的评分方案尚未得到广泛审查,其有效性也未得到充分确立。
1992年春季,通过对叶史瓦大学阿尔伯特·爱因斯坦医学院的两组四年级学生进行考试,对一种基于评分者总体印象的SP考试整体评分方法进行了试点测试。考试包括八个SP站点,代表一系列医学问题。两到三名经验丰富的临床教师独立审查每次问诊的所有书面材料。在研究的第一阶段,对16名学生的一组整体临床能力给出了优秀、合格、边缘或不合格的整体评分;在第二阶段,对26名学生的一组数据收集和沟通技能分别给出了整体评分。进行了病例间和评分者间的可靠性分析。
在两小时的测试中获得了足够的可靠性系数;总分(即学生在所有八个病例中的得分)区分了考生群体;而且,平均每次问诊评分所需时间不到两分钟。
尽管基于小样本,但该研究结果表明,这种整体评分方法在某些情况下可能有用。由于经验丰富的临床教师在看到临床能力时能够了解并达成共识,SP考试评分方案的开发者需要确定所获得的结果与专家教师的判断一致。