Suppr超能文献

单项选择题评分:不同评分方法的范围审查与比较

Scoring Single-Response Multiple-Choice Items: Scoping Review and Comparison of Different Scoring Methods.

作者信息

Kanzow Amelie Friederike, Schmidt Dennis, Kanzow Philipp

机构信息

Study Deanery, University Medical Center Göttingen, Göttingen, Germany.

Department of Preventive Dentistry, Periodontology and Cariology, University Medical Center Göttingen, Göttingen, Germany.

出版信息

JMIR Med Educ. 2023 May 19;9:e44084. doi: 10.2196/44084.

Abstract

BACKGROUND

Single-choice items (eg, best-answer items, alternate-choice items, single true-false items) are 1 type of multiple-choice items and have been used in examinations for over 100 years. At the end of every examination, the examinees' responses have to be analyzed and scored to derive information about examinees' true knowledge.

OBJECTIVE

The aim of this paper is to compile scoring methods for individual single-choice items described in the literature. Furthermore, the metric expected chance score and the relation between examinees' true knowledge and expected scoring results (averaged percentage score) are analyzed. Besides, implications for potential pass marks to be used in examinations to test examinees for a predefined level of true knowledge are derived.

METHODS

Scoring methods for individual single-choice items were extracted from various databases (ERIC, PsycInfo, Embase via Ovid, MEDLINE via PubMed) in September 2020. Eligible sources reported on scoring methods for individual single-choice items in written examinations including but not limited to medical education. Separately for items with n=2 answer options (eg, alternate-choice items, single true-false items) and best-answer items with n=5 answer options (eg, Type A items) and for each identified scoring method, the metric expected chance score and the expected scoring results as a function of examinees' true knowledge using fictitious examinations with 100 single-choice items were calculated.

RESULTS

A total of 21 different scoring methods were identified from the 258 included sources, with varying consideration of correctly marked, omitted, and incorrectly marked items. Resulting credit varied between -3 and +1 credit points per item. For items with n=2 answer options, expected chance scores from random guessing ranged between -1 and +0.75 credit points. For items with n=5 answer options, expected chance scores ranged between -2.2 and +0.84 credit points. All scoring methods showed a linear relation between examinees' true knowledge and the expected scoring results. Depending on the scoring method used, examination results differed considerably: Expected scoring results from examinees with 50% true knowledge ranged between 0.0% (95% CI 0% to 0%) and 87.5% (95% CI 81.0% to 94.0%) for items with n=2 and between -60.0% (95% CI -60% to -60%) and 92.0% (95% CI 86.7% to 97.3%) for items with n=5.

CONCLUSIONS

In examinations with single-choice items, the scoring result is not always equivalent to examinees' true knowledge. When interpreting examination scores and setting pass marks, the number of answer options per item must usually be taken into account in addition to the scoring method used.

摘要

背景

单项选择题(例如最佳答案题、二选一选择题、单项是非题)是多项选择题的一种类型,已在考试中使用了100多年。每次考试结束后,都必须对考生的答案进行分析和评分,以获取有关考生真实知识的信息。

目的

本文旨在汇总文献中描述的单个单项选择题的评分方法。此外,分析了度量预期机会得分以及考生真实知识与预期评分结果(平均百分比得分)之间的关系。此外,还得出了在考试中用于测试考生预定义真实知识水平的潜在及格分数的含义。

方法

2020年9月从各种数据库(教育资源信息中心、心理学文摘数据库、通过Ovid检索的Embase数据库、通过PubMed检索的医学期刊数据库)中提取单个单项选择题的评分方法。符合条件的资料报道了笔试中单个单项选择题的评分方法,包括但不限于医学教育领域。分别针对有n = 2个答案选项的题目(例如二选一选择题、单项是非题)和有n = 5个答案选项的最佳答案题(例如A型题),以及每种确定的评分方法,使用包含100个单项选择题的虚拟考试,计算度量预期机会得分以及作为考生真实知识函数的预期评分结果。

结果

从258篇纳入资料中总共确定了21种不同的评分方法,对正确标记、遗漏和错误标记的题目有不同的考量。每题的得分在-3到+1学分之间。对于有n = 2个答案选项的题目,随机猜测的预期机会得分在-1到+0.75学分之间。对于有n = 5个答案选项的题目,预期机会得分在-2.2到+0.84学分之间。所有评分方法都显示考生的真实知识与预期评分结果之间存在线性关系。根据所使用的评分方法,考试结果差异很大:对于有n = 2个答案选项的题目,真实知识为50%的考生的预期评分结果在0.0%(95%置信区间0%至0%)到87.5%(95%置信区间81.0%至94.0%)之间;对于有n = 5个答案选项的题目,在-60.0%(95%置信区间-60%至-60%)到92.0%(95%置信区间86.7%至97.3%)之间。

结论

在单项选择题考试中,评分结果并不总是等同于考生的真实知识。在解释考试分数和设定及格分数时,除了所使用的评分方法外,通常还必须考虑每题的答案选项数量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b84/10238964/a4017c3beab1/mededu_v9i1e44084_fig1.jpg

相似文献

4
Multiple true-false items: a comparison of scoring algorithms.
Adv Health Sci Educ Theory Pract. 2018 Aug;23(3):455-463. doi: 10.1007/s10459-017-9805-y. Epub 2017 Nov 30.
5
The psychometric properties of five scoring methods applied to the script concordance test.
Acad Med. 2005 Apr;80(4):395-9. doi: 10.1097/00001888-200504000-00019.
7
The Impact of Repeated Exposure to Items.
Teach Learn Med. 2015;27(4):404-9. doi: 10.1080/10401334.2015.1077131.
8
Using multiple response true-false multiple choice questions.
Aust N Z J Surg. 1999 Apr;69(4):311-5. doi: 10.1046/j.1440-1622.1999.01551.x.
10
The effect of a 'don't know' option on test scores: number-right and formula scoring compared.
Med Educ. 1999 Apr;33(4):267-75. doi: 10.1046/j.1365-2923.1999.00292.x.

引用本文的文献

1
Assessing the Climate Readiness of Physician Education Leaders in Graduate Medical Education.
J Patient Cent Res Rev. 2024 Oct 15;11(3):231-236. doi: 10.17294/2330-0698.2112. eCollection 2024 Fall.

本文引用的文献

2
Very short answer questions: a viable alternative to multiple choice questions.
BMC Med Educ. 2020 May 6;20(1):141. doi: 10.1186/s12909-020-02057-w.
3
PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation.
Ann Intern Med. 2018 Oct 2;169(7):467-473. doi: 10.7326/M18-0850. Epub 2018 Sep 4.
5
Use of Multi-Response Format Test in the Assessment of Medical Students' Critical Thinking Ability.
J Clin Diagn Res. 2017 Sep;11(9):LC10-LC13. doi: 10.7860/JCDR/2017/24884.10607. Epub 2017 Sep 1.
6
Determinants of Difficulty and Discriminating Power of Image-based Test Items in Postgraduate Radiological Examinations.
Acad Radiol. 2018 May;25(5):665-672. doi: 10.1016/j.acra.2017.10.014. Epub 2017 Dec 6.
7
Multiple true-false items: a comparison of scoring algorithms.
Adv Health Sci Educ Theory Pract. 2018 Aug;23(3):455-463. doi: 10.1007/s10459-017-9805-y. Epub 2017 Nov 30.
9
[Cues and pseudocues in surgical multiple choice questions from the German state examination].
Chirurg. 2017 Mar;88(3):239-243. doi: 10.1007/s00104-016-0291-1.
10
The don't know option in progress testing.
Adv Health Sci Educ Theory Pract. 2015 Dec;20(5):1325-38. doi: 10.1007/s10459-015-9604-2. Epub 2015 Apr 26.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验