Department of Radiology, Hospital Universitario Central de Asturias, Avenida de Roma S/N, Oviedo, Asturias, 33011, Spain.
Department of Mathematics, University of Oviedo, Oviedo, Spain.
BMC Med Educ. 2024 Apr 3;24(1):367. doi: 10.1186/s12909-024-05324-2.
Psychometrics plays a vital role in evaluating educational research, including the analysis of multiple-choice exams. This study aims to improve the discriminatory ability of the "Médico Interno Residente" (MIR) medical exam in Spain, used to rank candidates for specialized healthcare training, through psychometric analysis.
We analyzed 2,890 MIR exam questions from 2009 to 2021 (totaling 147,214 exams), categorizing them based on methodology and response type. Evaluation employed classical test theory and item response theory (IRT). Classical test theory determined difficulty and discrimination indices, while IRT assessed the relationship between knowledge levels and question performance.
Question distribution varied across categories and years. Frequently addressed knowledge areas included various medical specialties. Non-image-associated clinical cases were the easiest, while case-based clinical questions exhibited the highest discriminatory capacity, differing significantly from image-based case or negative questions. High-quality questions without images had longer stems but shorter answer choices. Adding images reduced discriminatory power and question difficulty, with image-based questions being easier. Clinical cases with images had shorter stems and longer answer choices.
For improved exam performance, we recommend using a clinical case format followed by direct short-answer questions. Questions should be of low difficulty, providing clear and specific answers based on scientific evidence and avoiding ambiguity. Typical clinical cases with key characteristic features should be presented, excluding uncertain boundaries of medical knowledge. Questions should have lengthy stems and concise answer choices, minimizing speculation. If images are used, they should be typical, clear, consistent with the exam, and presented within clinical cases using clinical semiotics and propaedeutics.
心理计量学在评估教育研究中起着至关重要的作用,包括多项选择题考试的分析。本研究旨在通过心理计量学分析,提高西班牙“Médico Interno Residente”(MIR)医学考试的区分能力,该考试用于对候选人为专业医疗培训进行排名。
我们分析了 2009 年至 2021 年的 2890 个 MIR 考试问题(共计 147214 次考试),根据方法和回答类型对其进行分类。评估采用经典测试理论和项目反应理论(IRT)。经典测试理论确定了难度和区分指数,而 IRT 评估了知识水平与问题表现之间的关系。
问题分布因类别和年份而异。经常涉及的知识领域包括各种医学专业。非图像相关的临床病例最简单,而基于案例的临床问题具有最高的区分能力,与基于图像的病例或否定问题有显著差异。没有图像的高质量问题具有更长的题干,但较短的答案选择。添加图像会降低区分力和问题难度,基于图像的问题更容易。具有图像的临床病例具有更短的题干和更长的答案选择。
为了提高考试成绩,我们建议使用临床案例格式,然后是直接的简短回答问题。问题应该具有较低的难度,提供基于科学证据的清晰和具体的答案,避免歧义。应呈现具有关键特征的典型临床病例,排除医学知识的不确定边界。问题应该具有较长的题干和简洁的答案选择,尽量减少猜测。如果使用图像,它们应该是典型的、清晰的、与考试一致的,并在临床病例中使用临床符号学和预科医学进行呈现。