对安格夫方法的见解：一项模拟研究的结果

Insights into the Angoff method: results from a simulation study.

作者信息

Shulruf Boaz, Wilkinson Tim, Weller Jennifer, Jones Philip, Poole Phillippa

机构信息

University of New South Wales, Sydney, Australia.

Otago University Christchurch, Christchurch, New Zealand.

出版信息

BMC Med Educ. 2016 May 4;16:134. doi: 10.1186/s12909-016-0656-7.

DOI:10.1186/s12909-016-0656-7

PMID:27142788

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4855704/

Abstract

BACKGROUND

In standard setting techniques involving panels of judges, the attributes of judges may affect the cut-scores. This simulation study modelled the effect of the number of judges and test items, as well as the impact of judges' attributes such as accuracy, stringency and influence on others on the precision of the cut-scores.

METHODS

Forty nine combinations of Angoff panels (N = 5, 10, 15, 20, 30, 50, and 80) and test items (n = 5, 10, 15, 20, 30, 50, and 80) were simulated. Each combination was simulated 100 times (in total 4,900 simulations). The simulation was of judges attributes: stringency, accuracy and leadership. Impact of judges attributes, number of judges, number of test items and Angoff's second (compared to the first) round on the precision of a panel's cut-score was measured by the deviation of the panel's cut-score from the cut-score's true value.

RESULTS

Findings from 4900 simulated panels supported Angoff being both reliable and valid. Unless the number of test items is small, panels of around 15 judges with mixed levels of expertise provide the most precise estimates. Furthermore, if test data were not presented, a second round of decision-making, as used in the modified Angoff, adds little to precision. A panel which has only experts or only non-experts yields a cut-score which is less precise than a cut-score yielded by a mixed-expertise panel, suggesting that optimal composition of an Angoff panel should include a range of judges with diverse expertise and stringency.

CONCLUSIONS

Simulations aim to improve our understanding of the models assessed but they do not describe natural phenomena as they do not use observed data. While the simulations undertaken in this study help clarify how to set cut-scores defensibly, it is essential to confirm these theories in practice.

摘要

背景

在涉及评审团的标准设定技术中，评审员的属性可能会影响及格分数。本模拟研究模拟了评审员数量和测试项目数量的影响，以及评审员属性（如准确性、严格性和对他人的影响力）对及格分数精度的影响。

方法

模拟了49种安格夫评审团组合（N = 5、10、15、20、30、50和80）和测试项目（n = 5、10、15、20、30、50和80）。每种组合模拟100次（总共4900次模拟）。模拟的是评审员属性：严格性、准确性和领导力。通过评审团及格分数与及格分数真实值的偏差来衡量评审员属性、评审员数量、测试项目数量以及安格夫第二轮（与第一轮相比）对评审团及格分数精度的影响。

结果

4900个模拟评审团的结果支持安格夫方法既可靠又有效。除非测试项目数量很少，否则由具有不同专业水平的约15名评审员组成的评审团能提供最精确的估计。此外，如果不提供测试数据，如在改进的安格夫方法中那样进行第二轮决策，对精度的提升不大。仅由专家或仅由非专家组成的评审团得出的及格分数不如由具有不同专业水平的评审团得出的及格分数精确，这表明安格夫评审团的最佳组成应包括一系列具有不同专业知识和严格程度的评审员。

结论

模拟旨在增进我们对所评估模型的理解，但它们并不描述自然现象，因为它们不使用观测数据。虽然本研究中进行的模拟有助于阐明如何合理地设定及格分数，但在实践中确认这些理论至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c2ad/4855704/1532b88a2f98/12909_2016_656_Fig1_HTML.jpg

相似文献

Insights into the Angoff method: results from a simulation study.

BMC Med Educ. 2016 May 4;16:134. doi: 10.1186/s12909-016-0656-7.

Is an Angoff standard an indication of minimal competence of examinees or of judges?

Adv Health Sci Educ Theory Pract. 2008 May;13(2):203-11. doi: 10.1007/s10459-006-9035-1. Epub 2006 Oct 17.

Reliability and credibility of an angoff standard setting procedure in progress testing using recent graduates as judges.

Med Educ. 1999 Nov;33(11):832-7. doi: 10.1046/j.1365-2923.1999.00487.x.

Cut-scores revisited: feasibility of a new method for group standard setting.

BMC Med Educ. 2018 Jun 7;18(1):126. doi: 10.1186/s12909-018-1238-7.

Panel expertise for an Angoff standard setting procedure in progress testing: item writers compared to recently graduated students.

Med Educ. 2002 Sep;36(9):860-7. doi: 10.1046/j.1365-2923.2002.01301.x.

Setting standards for performance tests: a pilot study of a three-level Angoff method.

Acad Med. 2008 Oct;83(10 Suppl):S13-6. doi: 10.1097/ACM.0b013e318183c683.

Effect of Trainee Performance Data on Standard-Setting Judgments Using the Mastery Angoff Method.

J Grad Med Educ. 2018 Jun;10(3):301-305. doi: 10.4300/JGME-D-17-00781.1.

The Effect of Rating Unfamiliar Items on Angoff Passing Scores.

Educ Psychol Meas. 2017 Dec;77(6):901-916. doi: 10.1177/0013164416670983. Epub 2016 Oct 10.

Simulation-based examinations in physician assistant education: A comparison of two standard-setting methods.

J Physician Assist Educ. 2010;21(2):7-14. doi: 10.1097/01367895-201021020-00002.

Using the Angoff method to set a standard on mock exams for the Korean Nursing Licensing Examination.

J Educ Eval Health Prof. 2020;17:14. doi: 10.3352/jeehp.2020.17.14. Epub 2020 Apr 22.

引用本文的文献

Evaluating AI proficiency in nuclear cardiology: Large language models take on the board preparation exam.

J Nucl Cardiol. 2025 Mar;45:102089. doi: 10.1016/j.nuclcard.2024.102089. Epub 2024 Nov 29.

Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam.

medRxiv. 2024 Jul 16:2024.07.16.24310297. doi: 10.1101/2024.07.16.24310297.

Angoff anchor statements: setting a flawed gold standard?

MedEdPublish (2016). 2017 Sep 21;6:167. doi: 10.15694/mep.2017.000167. eCollection 2017.

What should the standard be for passing and mastery on the Critical Thinking about Health Test? A consensus study.

BMJ Open. 2023 Feb 24;13(2):e066890. doi: 10.1136/bmjopen-2022-066890.

Equal Z standard-setting method to estimate the minimum number of panelists for a medical school’s objective structured clinical examination in Taiwan: a simulation study.

J Educ Eval Health Prof. 2022;19:27. doi: 10.3352/jeehp.2022.19.27. Epub 2022 Oct 17.

Achieving physical examination competence through optimizing hands-on practice cycles: a prospective cohort comparative study of medical students.

PeerJ. 2021 Dec 1;9:e12544. doi: 10.7717/peerj.12544. eCollection 2021.

Predictive validity of a tool to resolve borderline grades in OSCEs.

GMS J Med Educ. 2020 Apr 15;37(3):Doc31. doi: 10.3205/zma001324. eCollection 2020.

Standard setting made easy: validating the Equal Z-score (EZ) method for setting cut-score for clinical examinations.

BMC Med Educ. 2020 May 25;20(1):167. doi: 10.1186/s12909-020-02080-x.

Clinically relevant pharmacokinetic knowledge on antibiotic dosing among intensive care professionals is insufficient: a cross-sectional study.

Crit Care. 2019 May 22;23(1):185. doi: 10.1186/s13054-019-2438-1.

Blending Gagne's Instructional Model with Peyton's Approach to Design an Introductory Bioinformatics Lesson Plan for Medical Students: Proof-of-Concept Study.

JMIR Med Educ. 2018 Oct 25;4(2):e11122. doi: 10.2196/11122.

本文引用的文献

How to set standards on performance-based examinations: AMEE Guide No. 85.

Med Teach. 2014 Feb;36(2):97-110. doi: 10.3109/0142159X.2013.853119. Epub 2013 Nov 20.

Psychometric evaluation of a knowledge based examination using Rasch analysis: an illustrative guide: AMEE guide no. 72.

Med Teach. 2013;35(1):e838-48. doi: 10.3109/0142159X.2012.737488. Epub 2012 Nov 8.

Comparison of two methods of standard setting: the performance of the three-level Angoff method.

Med Educ. 2011 Dec;45(12):1199-208. doi: 10.1111/j.1365-2923.2011.04073.x.

Research in assessment: consensus statement and recommendations from the Ottawa 2010 Conference.

Med Teach. 2011;33(3):224-33. doi: 10.3109/0142159X.2011.551558.

Developing examinations that use equal raw scores for cut scores.

J Appl Meas. 2010;11(4):432-42.

Comparison between inter-rater reliability and inter-rater agreement in performance assessment.

Ann Acad Med Singap. 2010 Aug;39(8):613-8.

The second time around: accounting for retest effects on oral examinations.

Eval Health Prof. 2010 Sep;33(3):386-403. doi: 10.1177/0163278710374855.

Who will pass the dental OSCE? Comparison of the Angoff and the borderline regression standard setting methods.

Eur J Dent Educ. 2009 Aug;13(3):162-71. doi: 10.1111/j.1600-0579.2008.00568.x.

The impact of judge selection on standard setting for a patient survey of physician communication skills.

Acad Med. 2008 Oct;83(10 Suppl):S17-20. doi: 10.1097/ACM.0b013e318183e7bd.

Setting standards for performance tests: a pilot study of a three-level Angoff method.

Acad Med. 2008 Oct;83(10 Suppl):S13-6. doi: 10.1097/ACM.0b013e318183c683.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对安格夫方法的见解：一项模拟研究的结果

Insights into the Angoff method: results from a simulation study.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献