Suppr超能文献

脚本一致性测试分数的使用和解释中的有效性威胁。

Threats to validity in the use and interpretation of script concordance test scores.

机构信息

Department of Medical Education, University of Illinois at Chicago, Chicago, Illinois, USA.

出版信息

Med Educ. 2013 Dec;47(12):1175-83. doi: 10.1111/medu.12283.

Abstract

CONTEXT

Recent reviews have claimed that the script concordance test (SCT) methodology generally produces reliable and valid assessments of clinical reasoning and that the SCT may soon be suitable for high-stakes testing.

OBJECTIVES

This study is intended to describe three major threats to the validity of the SCT not yet considered in prior research and to illustrate the severity of these threats.

METHODS

We conducted a review of SCT reports available through the Web of Science database. Additionally, we reanalysed scores from a previously published SCT administration to explore issues related to standard SCT scoring practice.

RESULTS

Firstly, the predominant method for aggregate and partial credit scoring of SCTs introduces logical inconsistencies in the scoring key. Secondly, our literature review shows that SCT reliability studies have generally ignored inter-panel, inter-panellist and test-retest measurement error. Instead, studies have focused on observed levels of coefficient alpha, which is neither an informative index of internal structure nor a comprehensive index of reliability for SCT scores. As such, claims that SCT scores show acceptable reliability are premature. Finally, SCT criteria for item inclusion, in concert with a statistical artefact of the SCT format, cause anchors at the extremes of the scale to have less expected credit than anchors near or at the midpoint. Consequently, SCT scores are likely to reflect construct-irrelevant differences in examinees' response styles. This makes the test susceptible to bias against candidates who endorse extreme scale anchors more readily; it also makes two construct-irrelevant test taking strategies extremely effective. In our reanalysis, we found that examinees could drastically increase their scores by never endorsing extreme scale points. Furthermore, examinees who simply endorsed the scale midpoint for every item would still have outperformed most examinees who used the scale as it is intended.

CONCLUSIONS

Given the severity of these threats, we conclude that aggregate scoring of SCTs cannot be recommended. Recommendations for revisions of SCT methodology are discussed.

摘要

背景

最近的评论声称,脚本一致性测试(SCT)方法通常可以对临床推理进行可靠和有效的评估,并且 SCT 可能很快就适用于高风险测试。

目的

本研究旨在描述三个尚未在先前研究中考虑到的 SCT 有效性的主要威胁,并说明这些威胁的严重性。

方法

我们对 Web of Science 数据库中可用的 SCT 报告进行了审查。此外,我们重新分析了先前发表的 SCT 管理分数,以探讨与标准 SCT 评分实践相关的问题。

结果

首先,SCT 的总分数和部分分数的主要方法在评分键中引入了逻辑不一致。其次,我们的文献综述表明,SCT 可靠性研究通常忽略了面板间、面板内和测试重测测量误差。相反,研究集中在观察到的系数 alpha 水平上,这既不是内部结构的信息指标,也不是 SCT 分数可靠性的综合指标。因此,声称 SCT 分数具有可接受的可靠性还为时过早。最后,SCT 的项目纳入标准与 SCT 格式的统计伪影相结合,导致量表极值的锚点比接近或处于中点的锚点获得的预期信用较少。因此,SCT 分数可能反映了考生在反应风格上与测试无关的差异。这使得测试容易受到对更愿意支持极端量表锚点的考生的偏见;它还使两种与测试无关的测试策略非常有效。在我们的重新分析中,我们发现考生可以通过从不支持极端量表点来大幅提高他们的分数。此外,对于每个项目仅支持量表中点的考生,仍将比大多数按预期使用量表的考生表现更好。

结论

鉴于这些威胁的严重性,我们得出结论,不能推荐对 SCT 进行总分数评估。讨论了对 SCT 方法学的修订建议。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验