George Sanju, Haque M Sayeed, Oyebode Femi
Queen Elizabeth Psychiatric Hospital, Mindelsohn Way, Edgbaston, Birmingham, UK.
BMC Med Educ. 2006 Sep 14;6:46. doi: 10.1186/1472-6920-6-46.
The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard-setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method.
The norm-reference method of standard-setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method.
The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% - 87%). The modified Angoff method had an inter-rater reliability of 0.81-0.82 and a test-retest reliability of 0.59-0.74.
There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
评估结果取决于所使用的标准设定方法。标准设定方法种类繁多,在英国本科医学教育中使用最广泛的两种方法是常模参照法和标准参照法。本研究的目的是比较这两种用于多项选择题考试的标准设定方法,并评估改良的安格夫方法的重测信度和评分者间信度。
将常模参照标准设定方法(均值减去1个标准差)应用于78名四年级医学生在多项选择题考试(MCQ)中的“原始”分数。两组评分者也使用改良的安格夫方法,在两个不同时间(间隔6个月)对同一份多项选择题试卷设定标准。我们比较了常模参照法和安格夫法得出的及格/不及格率,并评估了改良安格夫方法的重测信度和评分者间信度。
常模参照法的及格率为85%(66/78),安格夫法的及格率为100%(78/78)。安格夫法与常模参照法之间的百分比一致性为78%(95%可信区间69% - 87%)。改良的安格夫方法评分者间信度为0.81 - 0.82,重测信度为0.59 - 0.74。
这两种标准设定方法的结果存在显著差异,表现为评估中及格和不及格考生比例的不同。改良的安格夫方法具有良好的评分者间信度和中等的重测信度。