Clauser Brian E, Harik Polina, Margolis Melissa J, Mee Janet, Swygert Kimberly, Rebbecchi Thomas
National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA 19104, USA.
Acad Med. 2008 Oct;83(10 Suppl):S41-4. doi: 10.1097/ACM.0b013e318183cd1d.
This research examined various sources of measurement error in the documentation score component of the United States Medical Licensing Examination (USMLE) Step 2 Clinical Skills examination.
A generalizability theory framework was employed to examine the documentation ratings for 847 examinees who completed the USMLE Step 2 Clinical Skills examination during an eight-day period in 2006. Each patient note was scored by two different raters allowing for a persons-crossed-with-raters-nested-in-cases design.
The results suggest that inconsistent performance on the part of raters makes a substantially greater contribution to measurement error than case specificity. Double scoring the notes significantly increases precision.
The results provide guidance for improving operational scoring of the patient notes. Double scoring of the notes may produce an increase in the precision of measurement equivalent to that achieved by lengthening the test by more than 50%. The study also cautions researchers that when examining sources of measurement error, inappropriate data-collection designs may result in inaccurate inferences.
本研究调查了美国医师执照考试(USMLE)第二步临床技能考试文档评分部分的各种测量误差来源。
采用概化理论框架,对2006年为期八天内完成USMLE第二步临床技能考试的847名考生的文档评分进行分析。每个病例记录由两名不同的评分者打分,采用人员与评分者交叉嵌套于病例的设计。
结果表明,评分者表现不一致对测量误差的影响远大于病例特异性。对病例记录进行双重评分可显著提高评分精度。
研究结果为改进病例记录的操作评分提供了指导。对病例记录进行双重评分可能会使测量精度提高,其效果等同于将考试时长延长50%以上。该研究还提醒研究者,在调查测量误差来源时,不恰当的数据收集设计可能导致不准确的推断。