National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA 19104-3102, USA.
Acad Med. 2011 Oct;86(10 Suppl):S55-8; quiz S58. doi: 10.1097/ACM.0b013e31822a6aa2.
A novel type of item sets, "f-type" testlets, was recently introduced on the United States Medical Licensing Examination. These testlets contain two or more questions associated with a common clinical scenario. In some cases, as the scenario unfolds, examinees are indirectly provided with feedback about their response to a testlet question. The effects of this format and of the provision of feedback to examinees about their performance are investigated.
Examinee behavior is predicted using an item response model, and observed examinee responses are compared with model expectations for f-type testlets. Mean model-data discrepancies among specific examinee groups are compared to study the dependencies across within-testlet items (i.e., case-specificity) and the impact of providing feedback.
Findings showed that case-specificity effects were present (on average) for all examinee subgroups except examinees who both responded unsuccessfully to the initial item within an f-type testlet and received feedback. Case-specificity effects were negative for examinees who responded unsuccessfully to the initial testlet item but did not receive feedback. For those who responded successfully to the initial testlet items, case-specificity effects were positive.
Results suggest that responses to test questions within an f-type testlet are not independent-even after accounting for examinee proficiency and item characteristics. Case-specificity effects (i.e., dependencies) were observed on average for all examinees except those who both responded unsuccessfully to the initial item within an f-type testlet and received feedback. Research into modeling these effects through the use of more general item response models is recommended.
最近,美国医师执照考试引入了一种新型的项目集,即“f 型”测试集。这些测试集包含两个或更多与常见临床场景相关的问题。在某些情况下,随着场景的展开,考生会间接收到关于其对测试集问题的反应的反馈。本文研究了这种格式以及向考生提供有关其表现的反馈的效果。
使用项目反应模型预测考生行为,并将观察到的考生反应与 f 型测试集的模型期望进行比较。通过比较特定考生群体之间的平均模型数据差异,研究了测试集中各项目之间的依赖性(即特定于案例的性质)以及提供反馈的影响。
研究结果表明,除了那些在 f 型测试集中最初的项目中回答不成功且收到反馈的考生外,所有考生亚组平均都存在特定于案例的效果。对于那些在初始测试集中回答不成功但未收到反馈的考生,特定于案例的效果为负。对于那些成功回答初始测试集项目的考生,特定于案例的效果为正。
结果表明,即使在考虑了考生的熟练程度和项目特征之后,f 型测试集中的测试问题的反应也不是独立的。除了那些在 f 型测试集中最初的项目中回答不成功且收到反馈的考生外,所有考生平均都观察到了特定于案例的效果。建议通过使用更通用的项目反应模型来研究这些效果的建模。