Kramer Anneke, Muijtjens Arno, Jansen Koos, Düsman Herman, Tan Lisa, van der Vleuten Cees
National Centre for Evaluation of Postgraduate Training in General Practice (SVUH), Utrecht, the Netherlands.
Med Educ. 2003 Feb;37(2):132-9. doi: 10.1046/j.1365-2923.2003.01429.x.
Earlier studies of absolute standard setting procedures for objective structured clinical examinations (OSCEs) show inconsistent results. This study compared a rational and an empirical standard setting procedure. Reliability and credibility were examined first. The impact of a reality check was then established.
The OSCE included 16 stations and was taken by trainees in their final year of postgraduate training in general practice and experienced general practitioners. A modified Angoff (independent judgements, no group discussion) with and without a reality check was used as a rational procedure. A method related to the borderline group procedure, the borderline regression (BR) method, was used as an empirical procedure. Reliability was assessed using generalisability theory. Credibility was assessed by comparing pass rates and by relating the passing scores to test difficulty.
The passing scores were 73.4% for the Angoff procedure without reality check (Angoff I), 66.0% for the Angoff procedure with reality check (Angoff II) and 57.6% for the BR method. The reliabilities (expressed as root mean square errors) were 2.1% for Angoffs I and II, and 0.6% for the BR method. The pass rates of the trainees and GPs were 19% and 9% for Angoff I, 66% and 46% for Angoff II, and 95% and 80% for the BR method, respectively. The correlation between test difficulty and passing score was 0.69 for Angoff I, 0.88 for Angoff II and 0.86 for the BR method.
The BR method provides a more credible and reliable standard for an OSCE than a modified Angoff procedure. A reality check improves the credibility of the Angoff procedure but does not improve its reliability.
早期关于客观结构化临床考试(OSCE)绝对标准设定程序的研究结果并不一致。本研究比较了一种理性和一种经验性标准设定程序。首先检验了可靠性和可信度。然后确定了现实核查的影响。
OSCE包括16个考站,由全科医学研究生培训最后一年的学员和经验丰富的全科医生参加。一种经过修改的安格夫法(独立判断,无小组讨论),有和没有现实核查,被用作理性程序。一种与临界组程序相关的方法,即临界回归(BR)法,被用作经验性程序。使用概化理论评估可靠性。通过比较通过率以及将及格分数与考试难度相关联来评估可信度。
无现实核查的安格夫法(安格夫I)的及格分数为73.4%,有现实核查的安格夫法(安格夫II)为66.0%,BR法为57.6%。可靠性(以均方根误差表示),安格夫I和II为2.1%,BR法为0.6%。学员和全科医生的通过率,安格夫I分别为19%和9%,安格夫II分别为66%和46%,BR法分别为95%和80%。考试难度与及格分数之间的相关性,安格夫I为0.69,安格夫II为0.88,BR法为0.86。
对于OSCE,BR法比经过修改的安格夫程序提供了更可信和可靠的标准。现实核查提高了安格夫程序的可信度,但没有提高其可靠性。