Suppr超能文献

对安格夫方法的见解:一项模拟研究的结果

Insights into the Angoff method: results from a simulation study.

作者信息

Shulruf Boaz, Wilkinson Tim, Weller Jennifer, Jones Philip, Poole Phillippa

机构信息

University of New South Wales, Sydney, Australia.

Otago University Christchurch, Christchurch, New Zealand.

出版信息

BMC Med Educ. 2016 May 4;16:134. doi: 10.1186/s12909-016-0656-7.

Abstract

BACKGROUND

In standard setting techniques involving panels of judges, the attributes of judges may affect the cut-scores. This simulation study modelled the effect of the number of judges and test items, as well as the impact of judges' attributes such as accuracy, stringency and influence on others on the precision of the cut-scores.

METHODS

Forty nine combinations of Angoff panels (N = 5, 10, 15, 20, 30, 50, and 80) and test items (n = 5, 10, 15, 20, 30, 50, and 80) were simulated. Each combination was simulated 100 times (in total 4,900 simulations). The simulation was of judges attributes: stringency, accuracy and leadership. Impact of judges attributes, number of judges, number of test items and Angoff's second (compared to the first) round on the precision of a panel's cut-score was measured by the deviation of the panel's cut-score from the cut-score's true value.

RESULTS

Findings from 4900 simulated panels supported Angoff being both reliable and valid. Unless the number of test items is small, panels of around 15 judges with mixed levels of expertise provide the most precise estimates. Furthermore, if test data were not presented, a second round of decision-making, as used in the modified Angoff, adds little to precision. A panel which has only experts or only non-experts yields a cut-score which is less precise than a cut-score yielded by a mixed-expertise panel, suggesting that optimal composition of an Angoff panel should include a range of judges with diverse expertise and stringency.

CONCLUSIONS

Simulations aim to improve our understanding of the models assessed but they do not describe natural phenomena as they do not use observed data. While the simulations undertaken in this study help clarify how to set cut-scores defensibly, it is essential to confirm these theories in practice.

摘要

背景

在涉及评审团的标准设定技术中,评审员的属性可能会影响及格分数。本模拟研究模拟了评审员数量和测试项目数量的影响,以及评审员属性(如准确性、严格性和对他人的影响力)对及格分数精度的影响。

方法

模拟了49种安格夫评审团组合(N = 5、10、15、20、30、50和80)和测试项目(n = 5、10、15、20、30、50和80)。每种组合模拟100次(总共4900次模拟)。模拟的是评审员属性:严格性、准确性和领导力。通过评审团及格分数与及格分数真实值的偏差来衡量评审员属性、评审员数量、测试项目数量以及安格夫第二轮(与第一轮相比)对评审团及格分数精度的影响。

结果

4900个模拟评审团的结果支持安格夫方法既可靠又有效。除非测试项目数量很少,否则由具有不同专业水平的约15名评审员组成的评审团能提供最精确的估计。此外,如果不提供测试数据,如在改进的安格夫方法中那样进行第二轮决策,对精度的提升不大。仅由专家或仅由非专家组成的评审团得出的及格分数不如由具有不同专业水平的评审团得出的及格分数精确,这表明安格夫评审团的最佳组成应包括一系列具有不同专业知识和严格程度的评审员。

结论

模拟旨在增进我们对所评估模型的理解,但它们并不描述自然现象,因为它们不使用观测数据。虽然本研究中进行的模拟有助于阐明如何合理地设定及格分数,但在实践中确认这些理论至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ad/4855704/1532b88a2f98/12909_2016_656_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验