Shatzer J H, Darosa D, Colliver J A, Barkmeier L
University of Illinois College of Medicine, Urbana-Champaign.
Acad Med. 1993 Mar;68(3):224-9. doi: 10.1097/00001888-199303000-00016.
To directly compare the generalizability of medical students' performance scores under systematically varied station times in two surgery end-of-clerkship performance-based examinations.
The participants were 36 third-year students randomly assigned to the first two rotations of the core surgery clerkship during 1991-92 at Southern Illinois University School of Medicine. The students rotated through a 12-station examination that employed standardized patients (SPs). In the first rotation, the student took six five-minute stations and six ten-minute stations. In the second rotation, the time lengths were reversed for the same stations. The students' total scores were based on (1) subscores on checklists that were completed by the SPs and (2) subscores on the students' written responses to short questions about each station (these responses were provided at station couplets that were five minutes long, regardless of station length). Generalizability coefficients were computed from the pooled rotation results to provide reliabilities for scores from the two station lengths.
Generalizability decreased in the ten-minute stations, mostly attributable to less variability among students' performances. The checklist subscores accounted for most of this variability, while couplet subscores remained stable between station lengths.
The longer station length actually decreased the generalizability of the scores by decreasing the variability among students' performances; thus, allocating different times to stations can affect the score reliability, as well as impact on the overall testing time, of performance-based examinations.
在两项基于表现的外科实习期末考核中,系统改变站点时长,直接比较医学生成绩分数的可推广性。
参与者为36名三年级学生,他们于1991 - 1992年在南伊利诺伊大学医学院核心外科实习的前两轮中被随机分配。学生们参加了一场有12个站点的考核,考核采用标准化病人(SPs)。在第一轮考核中,学生参加六个五分钟的站点和六个十分钟的站点。在第二轮考核中,相同站点的时长进行了颠倒。学生的总成绩基于:(1)由标准化病人完成的清单上的子分数,以及(2)学生针对每个站点的简短问题所写回答的子分数(无论站点时长如何,这些回答在时长为五分钟的站点对子中提供)。从两轮考核结果的汇总数据中计算出可推广性系数,以提供两种站点时长分数的可靠性。
十分钟站点的可推广性降低,这主要归因于学生表现之间的变异性较小。清单子分数占了这种变异性的大部分,而站点对子子分数在不同站点时长之间保持稳定。
较长的站点时长实际上通过降低学生表现之间的变异性而降低了分数的可推广性;因此,为各站点分配不同时间会影响基于表现的考核的分数可靠性,以及对整体测试时间的影响。