• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Experts' prediction of item difficulty of multiple-choice questions in the Ethiopian Undergraduate Medicine Licensure Examination.专家对埃塞俄比亚本科医学执照考试选择题项目难度的预测。
BMC Med Educ. 2024 Sep 16;24(1):1016. doi: 10.1186/s12909-024-06012-x.
2
The impact of repeated item development training on the prediction of medical faculty members' item difficulty index.重复项目开发训练对预测医学教师项目难度指数的影响。
BMC Med Educ. 2024 May 30;24(1):599. doi: 10.1186/s12909-024-05577-x.
3
Performance of the Ebel standard-setting method in spring 2019 Royal College of Physicians and Surgeons of Canada internal medicine certification examination consisted of multiple-choice questions.2019年春季加拿大皇家内科医师学会内科认证考试中埃贝尔标准设定方法的表现由多项选择题组成。
J Educ Eval Health Prof. 2020;17:12. doi: 10.3352/jeehp.2020.17.12. Epub 2020 Apr 20.
4
Quality of multiple-choice questions in medical internship qualification examination determined by item response theory at Debre Tabor University, Ethiopia.埃塞俄比亚德布雷塔博尔大学运用项目反应理论确定医学实习资格考试多项选择题的质量。
BMC Med Educ. 2022 Aug 22;22(1):635. doi: 10.1186/s12909-022-03687-y.
5
Prediction of Osteopathic Medical School Performance on the basis of MCAT score, GPA, sex, undergraduate major, and undergraduate institution.基于医学院入学考试(MCAT)成绩、平均绩点(GPA)、性别、本科专业和本科院校对整骨医学院表现的预测。
J Am Osteopath Assoc. 2012 Apr;112(4):175-81.
6
Leveraging Natural Language Processing: Toward Computer-Assisted Scoring of Patient Notes in the USMLE Step 2 Clinical Skills Exam.利用自然语言处理技术:实现美国医师执照考试第 2 阶段临床技能考试中对患者笔记的计算机辅助评分。
Acad Med. 2019 Mar;94(3):314-316. doi: 10.1097/ACM.0000000000002558.
7
Minimum accepted competency examination: test item analysis.最低可接受能力考试:试题分析。
BMC Med Educ. 2022 May 25;22(1):400. doi: 10.1186/s12909-022-03475-8.
8
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
9
A modified electronic key feature examination for undergraduate medical students: validation threats and opportunities.针对本科医学生的改良电子钥匙功能测试:验证的威胁与机遇
Med Teach. 2005 Aug;27(5):450-5. doi: 10.1080/01421590500078471.
10
Can physician examiners overcome their first impression when examinee performance changes?医师检查者能否克服第一印象,当受检者表现改变时?
Adv Health Sci Educ Theory Pract. 2018 Oct;23(4):721-732. doi: 10.1007/s10459-018-9823-4. Epub 2018 Mar 20.

引用本文的文献

1
Automatic- and Transformer-Based Automatic Item Generation: A Critical Review.基于自动和Transformer的自动试题生成:批判性综述
J Intell. 2025 Aug 12;13(8):102. doi: 10.3390/jintelligence13080102.

本文引用的文献

1
Quality of multiple-choice questions in medical internship qualification examination determined by item response theory at Debre Tabor University, Ethiopia.埃塞俄比亚德布雷塔博尔大学运用项目反应理论确定医学实习资格考试多项选择题的质量。
BMC Med Educ. 2022 Aug 22;22(1):635. doi: 10.1186/s12909-022-03687-y.
2
Using the Angoff method to set a standard on mock exams for the Korean Nursing Licensing Examination.运用安格夫方法为韩国护士执照考试的模拟考试设定标准。
J Educ Eval Health Prof. 2020;17:14. doi: 10.3352/jeehp.2020.17.14. Epub 2020 Apr 22.
3
Using clinical simulation to study how to improve quality and safety in healthcare.利用临床模拟研究如何提高医疗保健的质量和安全性。
BMJ Simul Technol Enhanc Learn. 2020 Mar 4;6(2):87-94. doi: 10.1136/bmjstel-2018-000370. Epub 2018 Sep 29.
4
Peer review improves psychometric characteristics of multiple choice questions.同行评审可提高多选题的心理测量学特征。
Med Teach. 2017 Apr;39(sup1):S50-S54. doi: 10.1080/0142159X.2016.1254743. Epub 2017 Jan 20.
5
Do they know too little? An inter-institutional study on the anatomical knowledge of upper-year medical students based on multiple choice questions of a progress test.他们了解得太少了吗?一项基于进步测试多项选择题的机构间研究,关于高年级医学生的解剖学知识。
Ann Anat. 2017 Jan;209:93-100. doi: 10.1016/j.aanat.2016.09.004. Epub 2016 Oct 13.
6
Trends in national licensing examinations in medicine.医学国家执照考试的趋势。
Med Educ. 2016 Jan;50(1):101-14. doi: 10.1111/medu.12810.
7
Twelve Tips for programmatic assessment.程序化评估的十二条建议。
Med Teach. 2015 Jul;37(7):641-646. doi: 10.3109/0142159X.2014.973388. Epub 2014 Nov 20.
8
Standard setting in medical education: fundamental concepts and emerging challenges.医学教育中的标准设定:基本概念与新出现的挑战。
Med J Islam Repub Iran. 2014 May 19;28:34. eCollection 2014.
9
Analysis of one-best MCQs: the difficulty index, discrimination index and distractor efficiency.单项最佳选择题分析:难度指数、区分指数及干扰项效率
J Pak Med Assoc. 2012 Feb;62(2):142-7.
10
Programmatic assessment: From assessment of learning to assessment for learning.计划性评估:从学习评估到学习促进评估。
Med Teach. 2011;33(6):478-85. doi: 10.3109/0142159X.2011.565828.

专家对埃塞俄比亚本科医学执照考试选择题项目难度的预测。

Experts' prediction of item difficulty of multiple-choice questions in the Ethiopian Undergraduate Medicine Licensure Examination.

机构信息

Institute of Health, Jimma University, Jimma, Ethiopia.

Faculty of Medicine, Institute of Health and Society, University of Oslo, Oslo, Norway.

出版信息

BMC Med Educ. 2024 Sep 16;24(1):1016. doi: 10.1186/s12909-024-06012-x.

DOI:10.1186/s12909-024-06012-x
PMID:39285419
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11407004/
Abstract

BACKGROUND

The ability of an expert's item difficulty ratings to predict test-taker actual performance is an important aspect of licensure examinations. Expert judgment is used as a primary source of information for users to make prior decisions to determine the pass rate of test takers. The nature of raters involved in predicting item difficulty is central to set credible standards. Therefore, this study aimed to assess and compare raters' prediction and actual Multiple-Choice Questions' difficulty of the undergraduate medicine licensure examination (UGMLE) in Ethiopia.

METHOD

815 examinees' responses to 200 Multiple-Choice Questions (MCQs) were used in this study. The study also included experts' item difficulty ratings of seven physicians who participated in the standard settings of UGMLE. Then, analysis was conducted to understand experts' rating variation in predicting the actual difficulty levels of examinees. Descriptive statistics was used to profile the mean rater's and actual difficulty value for MCQs, and ANOVA was used to compare the mean differences between raters' prediction of item difficulty. Additionally, regression analysis was used to understand the interrater variations in item difficulty predictions compared to the actual difficulty. The proportion of variance of actual difficulty explained from rater prediction was computed using regression analysis.

RESULTS

In this study, the mean difference between raters' prediction and examinees' actual performance was inconsistent across the exam domains. The study revealed a statistically significant strong positive correlation between the actual and predicted item difficulty in exam domains eight and eleven. However, a non-statistically significant very weak positive correlation was reported in exam domains seven and twelve. The multiple comparison analysis showed significant differences in mean item difficulty ratings between raters. In the regression analysis, experts' item difficulty ratings of the UGMLE had 33% power in predicting the actual difficulty level. The regression model also showed a moderate positive correlation (R = 0.57) that was statistically significant at F (6, 193) = 15.58, P = 0.001.

CONCLUSION

This study demonstrated the complex process for assessing the difficulty level of MCQs in the UGMLE and emphasized the benefits of using experts' ratings in advance. To ensure the exams maintain the necessary reliable and valid scores, raters' accuracy on the UGMLE must be improved. To achieve this, techniques that align with the evolving assessment methodologies must be developed.

摘要

背景

专家对项目难度的评分能力能够预测应试者的实际表现,这是执照考试的一个重要方面。专家判断是用户做出预先决策以确定应试者通过率的主要信息来源。参与预测项目难度的评分者的性质对于制定可信标准至关重要。因此,本研究旨在评估和比较埃塞俄比亚本科医学执照考试(UGMLE)中评分者对多项选择题(MCQ)难度的预测和实际难度。

方法

本研究使用了 815 名应试者对 200 道多项选择题(MCQ)的回答。研究还包括参与 UGMLE 标准制定的七名医生的专家项目难度评分。然后,进行了分析以了解专家在预测应试者实际难度水平方面的评分变化。使用描述性统计来分析 MCQ 的平均评分者和实际难度值,并使用方差分析比较评分者对项目难度预测的平均差异。此外,使用回归分析了解项目难度预测的评分者之间的差异与实际难度之间的关系。使用回归分析计算了实际难度从评分者预测中解释的方差比例。

结果

在这项研究中,评分者的预测与应试者的实际表现之间的差异在考试领域之间不一致。研究表明,在考试领域八和十一中,实际难度和预测难度之间存在统计学上显著的强正相关。然而,在考试领域七和十二中,报告了非统计学上非常弱的正相关。多比较分析显示评分者之间的平均项目难度评分存在显著差异。在回归分析中,UGMLE 的专家项目难度评分有 33%的能力可以预测实际难度水平。回归模型还显示出中度正相关(R=0.57),在 F(6, 193)=15.58 时具有统计学意义,P=0.001。

结论

本研究展示了评估 UGMLE 多项选择题难度的复杂过程,并强调了提前使用专家评分的好处。为了确保考试保持必要的可靠和有效分数,必须提高评分者在 UGMLE 上的准确性。为了实现这一目标,必须开发与不断发展的评估方法相一致的技术。