Test Development Services, National Board of Medical Examiners, Philadelphia, PA, 19104, USA.
Adv Health Sci Educ Theory Pract. 2012 Aug;17(3):325-37. doi: 10.1007/s10459-011-9309-0. Epub 2011 Oct 1.
Examinees who initially fail and later repeat an SP-based clinical skills exam typically exhibit large score gains on their second attempt, suggesting the possibility that examinees were not well measured on one of those attempts. This study evaluates score precision for examinees who repeated an SP-based clinical skills test administered as part of the US Medical Licensing Examination sequence. Generalizability theory was used as the basis for computing conditional standard errors of measurement (SEM) for individual examinees. Conditional SEMs were computed for approximately 60,000 single-take examinees and 5,000 repeat examinees who completed the Step 2 Clinical Skills Examination(®) between 2007 and 2009. The study focused exclusively on ratings of communication and interpersonal skills. Conditional SEMs for single-take and repeat examinees were nearly indistinguishable across most of the score scale. US graduates and IMGs were measured with equal levels of precision at all score levels, as were examinees with differing levels of skill speaking English. There was no evidence that examinees with the largest score changes were measured poorly on either their first or second attempt. The large score increases for repeat examinees on this SP-based exam probably cannot be attributed to unexpectedly large errors of measurement.
初次参加基于情景模拟的临床技能考试未通过、之后重考的考生通常在第二次考试中会获得显著的分数提升,这表明他们在其中一次考试中可能没有得到很好的衡量。本研究评估了美国医师执照考试序列中作为一部分的基于情景模拟的临床技能考试的重考考生的分数精确性。概化理论被用作计算个体考生条件测量标准误差(SEM)的基础。约有 60000 名单次考试考生和 5000 名重考考生的条件 SEM 是在 2007 年至 2009 年间完成 Step 2 临床技能考试(®)时计算的。该研究仅关注沟通和人际交往技能的评分。在大部分分数范围内,单次考试和重考考生的条件 SEM 几乎没有区别。美国毕业生和国际医学研究生与所有分数水平的考生一样,在英语技能水平不同的考生中,测量的精确性也相同。没有证据表明分数变化最大的考生在第一次或第二次考试中都被低估了。在这个基于情景模拟的考试中,重考考生的分数显著提高,这可能不能归因于测量误差的意外增大。